article thumbnail

Snowflake Ventures Invests in Anomalo for Advanced Data Quality

Snowflake

Anomalo was founded in 2018 by two Instacart alumni, Elliot Shmukler and Jeremy Stanley. While working together, they bonded over their shared passion for data. After experiencing numerous data quality challenges, they created Anomalo, a no-code platform for validating and documenting data warehouse information.

article thumbnail

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

This means that ideally the logic in source control describes how to build the full state of the data warehouse throughout all time periods. If someone else was to introduce an unrelated change that required “backfilling” 2017, they would apply the 2018 rule to 2017 data without knowing.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

We have a long history of giving users transparency and control over their data: 2010: Users can retrieve a copy of their information through DYI. 2018: Users have a curated experience to find information about them through Access Your Information. 2024: Users can access data logs in Download Your Information.

article thumbnail

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Data Engineering Podcast

Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used.

IT 147
article thumbnail

Databricks, Snowflake and the future

Christophe Blefari

Snowflake was founded in 2012 around its data warehouse product, which is still its core offering, and Databricks was founded in 2013 from academia with Spark co-creator researchers, becoming Apache Spark in 2014. Databricks is focusing on simplification (serverless, auto BI 2 , improved PySpark) while evolving into a data warehouse.

Metadata 147
article thumbnail

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

Estimates vary, but the amount of new data produced, recorded, and stored is in the ballpark of 200 exabytes per day on average, with an annual total growing from 33 zettabytes in 2018 to a projected 169 zettabytes in 2025. Data volume and velocity, governance, structure, and regulatory requirements have all evolved and continue to.

article thumbnail

Let The Whole Team Participate In Data With The Quilt Versioned Data Hub

Data Engineering Podcast

Your host is Tobias Macey and today I'm interviewing Aneesh Karve about how Quilt Data helps you bring order to your chaotic data in S3 with transactional versioning and data discovery built in Interview Introduction How did you get involved in the area of data management?