article thumbnail

Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data

Towards Data Science

Building more efficient AI TLDR : Data-centric AI can create more efficient and accurate models. Full code and results available here onGitHub. Moving experiment configs to a YAML, automatically saving results to a file, and having o1 write my visualization code made life mucheasier. MNIST handwritten digit database.

article thumbnail

Data Engineering Weekly #196

Data Engineering Weekly

The blog emphasizes the importance of starting with a clear client focus to avoid over-engineering and ensure user-centric development. impactdatasummit.com Thumbtack: What we learned building an ML infrastructure team at Thumbtack Thumbtack shares valuable insights from building its ML infrastructure team.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Unlocking Operational Efficiency: A Major Home Improvement Retailer’s Path to Data Modernization with Striim

Striim

Known for its customer-centric approach and expansive product offerings, the company has maintained its leadership position in the industry for decades. After evaluating options, the retailer partnered with Striim to leverage its real-time data streaming and low-code/no-code integration capabilities.

Retail 52
article thumbnail

Data Engineering Weekly #182

Data Engineering Weekly

I like testing people on their practical knowledge rather than artificial coding challenges. Adopting LLM in SQL-centric workflow is particularly interesting since companies increasingly try text-2-SQL to boost data usage. Log-as-the-Database (P2): Sending only write-ahead logs to the storage side upon transaction commit.

article thumbnail

10 Lessons from 10 Years of Innovation and Engineering at Picnic

Picnic Engineering

A decade ago, Picnic set out to reinvent grocery shopping with a tech-first, customer-centric approach. For instance, we built self-service tools for all our engineers that allow them to handle tasks like environment setup, database management, or feature deployment effectively.

article thumbnail

The Race For Data Quality in a Medallion Architecture

DataKitchen

Bronze layers can also be the raw database tables. If you can modify or control the ingestion code, data quality tests, and validation checks should ideally be integrated directly into the process. Alternatively, suppose you do not control the ingestion code. Bronze layers should be immutable.

article thumbnail

Every Company is Becoming a Software Company

Confluent

Of course, this is not to imply that companies will become only software (there are still plenty of people in even the most software-centric companies), just that the full scope of the business is captured in an integrated software defined process. Apache Kafka ® and its uses.