2018 and Data Warehouse - Data Engineering Digest

Snowflake Ventures Invests in Anomalo for Advanced Data Quality

Snowflake

MARCH 12, 2025

Anomalo was founded in 2018 by two Instacart alumni, Elliot Shmukler and Jeremy Stanley. While working together, they bonded over their shared passion for data. After experiencing numerous data quality challenges, they created Anomalo, a no-code platform for validating and documenting data warehouse information.

Unstructured Data

Unstructured Data High Quality Data Banking Machine Learning

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

JANUARY 7, 2018

This means that ideally the logic in source control describes how to build the full state of the data warehouse throughout all time periods. If someone else was to introduce an unrelated change that required “backfilling” 2017, they would apply the 2018 rule to 2017 data without knowing.

Data Process

Data Process Data Engineering Data Engineer Process

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

We have a long history of giving users transparency and control over their data: 2010: Users can retrieve a copy of their information through DYI. 2018: Users have a curated experience to find information about them through Access Your Information. 2024: Users can access data logs in Download Your Information.

Accessible

Accessible Accessibility Raw Data Data Warehouse

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Your Step-by-Step Guide to Become a Data Engineer in 2025

ProjectPro

JUNE 6, 2025

If you are planning to make a career transition into data engineering and want to know how to become a data engineer, this is the perfect place to begin your journey. Beginners will especially find it helpful if they want to know how to become a data engineer from scratch. for working on cloud data warehouses.

Data Engineering

Data Engineering Data Engineer Engineering Amazon Web Services

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 19, 2023

Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used.

IT

IT Data Lake Metadata Data Warehouse

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

Snowflake was founded in 2012 around its data warehouse product, which is still its core offering, and Databricks was founded in 2013 from academia with Spark co-creator researchers, becoming Apache Spark in 2014. Databricks is focusing on simplification (serverless, auto BI 2 , improved PySpark) while evolving into a data warehouse.

Metadata

Metadata Data Warehouse BI Scala

Let The Whole Team Participate In Data With The Quilt Versioned Data Hub

Data Engineering Podcast

FEBRUARY 11, 2023

Your host is Tobias Macey and today I'm interviewing Aneesh Karve about how Quilt Data helps you bring order to your chaotic data in S3 with transactional versioning and data discovery built in Interview Introduction How did you get involved in the area of data management?

PostgreSQL

PostgreSQL SQL Machine Learning Data Warehouse

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

Estimates vary, but the amount of new data produced, recorded, and stored is in the ballpark of 200 exabytes per day on average, with an annual total growing from 33 zettabytes in 2018 to a projected 169 zettabytes in 2025. Data volume and velocity, governance, structure, and regulatory requirements have all evolved and continue to.

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

5 Big Data Challenges in 2024

Knowledge Hut

MARCH 7, 2024

With the rise in opportunities related to Big Data, challenges are also bound to increase. Below are the 5 major Big Data challenges that enterprises face in 2024: 1. The Need for More Trained Professionals Research shows that since 2018, 2.5 Two, it creates a commonality of data definitions, concepts, metadata and the like.

Big Data

Big Data Bytes Data Governance Government

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

FEBRUARY 15, 2023

Before going into further details on Delta Lake, we need to remember the concept of Data Lake, so let’s travel through some history. The main player in the context of the first data lakes was Hadoop, a distributed file system, with MapReduce, a processing paradigm built over the idea of minimal data movement and high parallelism.

Data Lake

Data Lake Data Warehouse Data Architecture Hadoop

Beyond Data Fabrics: Cloudera Modern Data Architectures

Cloudera

JULY 11, 2022

Their most recent evaluation, Forrester Wave : Enterprise Data Fabric, Q2 2022, came out on June 23, 2022 and ranked Cloudera as a strong performer. Forrester ranked Cloudera at the same level in their two previous Wave reports on this topic (2020 and 2018). .

Data Architecture

Data Architecture Architecture Data Government

Q&A with Greg Rahn – The changing Data Warehouse market

Cloudera

DECEMBER 12, 2018

After having rebuilt their data warehouse, I decided to take a little bit more of a pointed role, and I joined Oracle as a database performance engineer. I spent eight years in the real-world performance group where I specialized in high visibility and high impact data warehousing competes and benchmarks.

Data Warehouse

Data Warehouse Relational Database Hadoop BI

Building A Data Lake For The Database Administrator At Upsolver

Data Engineering Podcast

JUNE 1, 2020

Your host is Tobias Macey and today I’m interviewing Ori Rafael and Yoni Iny about building a data lake for the DBA at Upsolver Interview Introduction How did you get involved in the area of data management? Can you start by sharing your definition of what a data lake is and what it is comprised of?

Data Lake

Data Lake Database Building Lambda Architecture

Three Trends for Modernizing Analytics and Data Warehousing in 2019

Cloudera

DECEMBER 19, 2018

Business intelligence (BI), an umbrella term coined in 1989 by Howard Dresner, Chief Research Officer at Dresner Advisory Services, refers to the ability of end-users to access and analyze enterprise data. Only three years later, that number more than tripled to 59% in 2018.

BI

BI Cloud Computing Data Warehouse Big Data

Cloudera + Hortonworks, from the Edge to AI

Cloudera

OCTOBER 3, 2018

The tremendous growth in both unstructured and structured data overwhelms traditional data warehouses. We are both convinced that a scale-out, shared-nothing architecture — the foundation of Hadoop — is essential for IoT, data warehousing and ML. We have each innovated separately in those areas.

Hadoop

Hadoop Cloud Data Storage Machine Learning

Pig Interview Questions and Answers for 2025

ProjectPro

JUNE 6, 2025

Preparing for a Hadoop job interview then this list of most commonly asked Apache Pig Interview questions and answers will help you ace your hadoop job interview in 2018. Here’s the first post in the category that lists Apache Pig Interview Questions and Answers for 2018.

Hadoop

Hadoop Java SQL Big Data

DATE_TRUNC SQL function: Why we love it

dbt Developer Hub

JULY 12, 2022

Most, if not all, modern cloud data warehouses support some type of the DATE_TRUNC function. There may be some minor differences between the argument order for DATE_TRUNC across data warehouses, but the functionality very much remains the same.

SQL

SQL IT Data Warehouse BI

Data Virtualization: Process, Components, Benefits, and Available Tools

AltexSoft

NOVEMBER 23, 2021

Before we get into more detail, let’s determine how data virtualization is different from another, more common data integration technique — data consolidation. Data virtualization vs data consolidation. The example of a typical two-tier architecture with a data lake and data warehouses and several ETL processes.

Process

Process Data Lake Metadata Data Warehouse

A New Era in Data Warehousing

Cloudera

AUGUST 2, 2018

In each of the cases outlined above, the technology enabler is a new generation of data warehouses. We call it ‘Modern Data Warehousing’. Simply put, modern data warehousing enables our customers to confidently share petabytes of verified data across thousands of users while surpassing demands of SLAs and limited budgets.

Data Warehouse

Data Warehouse Insurance Banking Cloud

Optimizing EC2 costs on Databricks

Sync Computing

JANUARY 27, 2025

The global data landscape is experiencing remarkable growth, with unprecedented increases in data generation and substantial investments in analytics and infrastructure. As the volume of data continues to grow, so does the need for specialized skills to effectively manage it.

AWS

AWS Data Lake Machine Learning Big Data

2020 Data Impact Award Winner Spotlight: Telkomsel

Cloudera

DECEMBER 2, 2020

After experiencing negative growth in 2018, Telkomsel made the strategic decision to focus solely on becoming a trusted provider of mobile, digital lifestyle, services, and solutions. With access to vast amounts of data from its customer base, the company knew its ability to mine this data would be a key driver of positive transformation.

Telecommunication

Telecommunication Data Science Business Intelligence Data Warehouse

EXTRACT SQL function: Why we love it

dbt Developer Hub

MAY 14, 2022

extract(<date_part > from <date/time field > ) Depending on the data warehouse you use, the value returned from an EXTRACT function is often a numeric value or the same date type as the input <date/time field> Read the documentation for your data warehouse to better understand EXTRACT outputs.

SQL

SQL IT Bytes Data Warehouse

How To Succeed As a DataOps Engineer

DataKitchen

NOVEMBER 20, 2021

Chip Bloche is a Data Engineering Director at DataKitchen. Chip joined DataKitchen as a DataOps chef in 2018 leading a team of DataOps Engineers. The post How To Succeed As a DataOps Engineer first appeared on DataKitchen.

Engineering

Engineering Machine Learning Data Engineering Data Engineer

Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query

Towards Data Science

MARCH 6, 2023

Google Big Query (GBQ) is Google’s cloud data warehouse solution. An OLAP-focused database with a serverless SQL query execution capable of processing large amounts of data. Data used — Microdados do Censo da Educação Superior , [CC BY-ND 3.0], INEP-Brazilian Gov. [1] Read them later using their “path”. Google Cloud.

Google Cloud

Google Cloud Cloud Storage Data Pipeline Cloud

Unlocking Cloud Insights: A Comprehensive Guide to AWS Data Analytics

Edureka

JUNE 1, 2023

AWS provides services for data transfer, data storage, data lakes, big data analytics, machine learning, and everything in between that are specifically designed to deliver the greatest price-performance. is a next-generation pharmacy organization that delivers meaningful solutions to the people it serves.

AWS

AWS Data Analytics Cloud Amazon Web Services

Governing for digital transformation and growth

Cloudera

FEBRUARY 11, 2019

In the next sections, we’ll reveal what else is needed as well as how to right-size governance of more than just data helps organizations achieve their objectives. To achieve their goals of digital transformation and becoming data-driven, companies need more than just a better data warehouse or BI tool.

Government

Government Data Governance Data Science Machine Learning

What are the Pre-requisites to learn Hadoop?

ProjectPro

SEPTEMBER 11, 2015

By 2018, the Big Data market will be about $46.34 Demand for Big Data Analytics talent will by far surpass the supply of talent by 2018. According to a McKinsey Global Institute study, it is estimated that in the United States alone, there will be a shortage of Big Data and Hadoop talent by 1.9k

Hadoop

Hadoop Java BI Big Data

Top 8 Data Engineering Books [Beginners to Advanced]

Knowledge Hut

JUNE 30, 2023

Spark: The Definitive Guide: Big Data Processing Made Simple Spark: The Definitive Guide: Big Data Processing Made Simple is a must-have reference for individuals wishing to get started with Apache Spark. Investigate the difficulties and solutions in developing distributed systems and ensuring data consistency.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

What is Operational Analytics?

Grouparoo

SEPTEMBER 7, 2021

The Modern Data Stack is a recent development in the data engineering space. The core enabler of the Modern Data Stack is that data warehouse technologies such as Snowflake, BigQuery, and Redshift have gotten fast enough and cheap enough to be considered the source of truth for many businesses.

ETL Tools

ETL Tools Data Warehouse Business Intelligence Datasets

Understanding the components of the dbt Semantic Layer

dbt Developer Hub

JULY 26, 2022

dbt Cloud proxy server: this component enables dbt Cloud to dynamically rewrite requests to a data warehouse and compile dbt-SQL into raw SQL that the database understands. It’s a thin interface that is primarily responsible for performance and reliability in production environments. select * from {{ metrics.

Metadata

Metadata BI Datasets SQL

Joining the Astronomer team

Datakin

MARCH 22, 2022

Astronomer, founded in 2018, offers products and services that help customers get the most out of Airflow. But most of them are designed primarily to collect it from the query logs of data warehouses. Once a company has reached a certain size or complexity, chances are good they’re using Airflow.

Electronics

Electronics Datasets Government Metadata

Top Data Lake Vendors (Quick Reference Guide)

Monte Carlo

APRIL 24, 2023

By accommodating various data types, reducing preprocessing overhead, and offering scalability, data lakes have become an essential component of modern data platforms , particularly those serving streaming or machine learning use cases. See our post: Data Lakes vs. Data Warehouses.

Data Lake

Data Lake Google Cloud Data Warehouse AWS

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

Moving beyond traditional data-at-rest analytics: next generation stream processing with Apache Flink. By 2018, we saw the majority of our customers adopt Apache Kafka as a key part of their streaming ingestion, application integration, and microservice architecture. Better yet, it works in any cloud environment.

Kafka

Kafka Manufacturing Data Lake SQL

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Database-centric In bigger organizations, Data engineers mainly focus on data analytics since the data flow in such organizations is huge. Data engineers who focus on databases work with data warehouses and develop different table schemas. Let us now understand the basic responsibilities of a Data engineer.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

2017 – Another Award-Winning Year for Cloudera!

Cloudera

FEBRUARY 16, 2018

W e look forward to continuing our mission to help our customers unlock the power of data in 2018. PS – The 2018 awards are already coming in and one of note is our inclusion in the Best Places to Work for LGBTQ Equality list from the Human Rights Campaign’s 2018 Corporate Equality Index.

Manufacturing

Manufacturing Cloud Computing Healthcare Machine Learning

Nominations Now Open for the Sixth Annual Cloudera Data Impact Awards

Cloudera

MAY 10, 2018

Cloudera 2017 Data Impact Award Winners. We are excited to kick off the 2018 Data Impact Awards ! Since 2012, the Data Impact Awards have showcased how organizations are using Cloudera and the power of data to transform themselves and achieve dramatic results. Deadline to submit is July 20, 2018.

Healthcare

Healthcare Machine Learning Media Data Science

Ramp Simplifies Data Architecture Management, Cuts Costs, and Delivers Market Insights to Customers at Scale

Snowflake

JANUARY 30, 2023

That all changed when Ramp switched to a scalable, simple data cloud with Snowflake. Since 2018, Ramp has helped finance teams at organizations to better manage resources, make more informed decisions, and automate revenue and user forecasting. How do you scale seamlessly, without worrying about keeping the lights on?

Data Architecture

Data Architecture Architecture Management Consulting

Now Available: Cloudera Data Science Workbench Release 1.4

Cloudera

MAY 22, 2018

With CDSW, organizations can research and experiment faster, deploy models easily and with confidence, as well as rely on the wider Cloudera platform to reduce the risks and costs of data science projects. for the Oracle Big Data Appliance). To see the new capabilities in action, join our webinar on 13 June 2018.

Data Science

Data Science Machine Learning Metadata Big Data

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

Iceberg provides advanced features such as schema evolution, which allows modifications to a table’s schema without downtime, and snapshot isolation, which ensures data consistency. And that matters — because these new table formats are also introducing complexity in other ways.

Data Lake

Data Lake Metadata Hadoop Data Governance

The Future of Data Engineering and Data Engineers

Knowledge Hut

JULY 5, 2024

Data engineers like myself play a pivotal role in assessing infrastructure and taking relevant actions. Looking ahead, the future of data engineering appears promising. With the increasing computing power of various cloud data warehouses, data engineers will be capable of efficiently handling large-scale tasks.

Data Engineering

Data Engineering Data Engineer Engineering Data Cleanse

The Art of Master Data Management at Picnic

Picnic Engineering

OCTOBER 17, 2023

This data is consumed by almost all our software systems such as our app, our purchase order management system, warehouse management systems, fintech, data warehouse and data science systems.

Data Management

Data Management Management Food Data

Introducing Blended Learning From Cloudera University

Cloudera

JUNE 29, 2018

Starting July 30, 2018, Cloudera University will post a monthly session of blended learning. Registration is now open for the first blended learning course, Developer for Spark and Hadoop Training , scheduled to begin July 30, 2018. How Will Blended Learning Work? Want to Get Started with Blended Learning?

Hadoop

Hadoop Big Data Datasets SQL

The Airflow Smart Sensor Service

Airbnb Tech

SEPTEMBER 28, 2021

Back in 2018, Airbnb’s Airflow cluster had several thousand DAGs and more than 30 thousand tasks running at the same time. A typical Airflow cluster supports thousands of workflows, called DAGs (directed acyclic graphs), and there could be tens of thousands of concurrently running tasks at peak hours.

Utilities

Utilities Database Data Warehouse Data Pipeline

And the winners are…. Congratulations to the Sixth Annual Data Impact Awards winners

Cloudera

SEPTEMBER 12, 2018

. — Mike Barlow, author of “Learning to Love Data Science” (O’Reilly Media). And now, without further delay, we are excited to announce the winners of the 2018 Data Impact Awards, listed by award theme and category: Business Impact. Modern Data Warehousing: Barclays (nominated together with BlueData ).

Pharmaceutical

Pharmaceutical Banking Amazon Web Services Machine Learning

Snowflake Ventures Invests in Anomalo for Advanced Data Quality

Functional Data Engineering — a modern paradigm for batch data processing

Webinars

Trending Sources

Data logs: The latest evolution in Meta’s access tools

Webinars

Your Step-by-Step Guide to Become a Data Engineer in 2025

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Databricks, Snowflake and the future

Let The Whole Team Participate In Data With The Quilt Versioned Data Hub

Data Lake vs. Data Warehouse vs. Data Lakehouse

5 Big Data Challenges in 2024

Hands-On Introduction to Delta Lake with (py)Spark

Beyond Data Fabrics: Cloudera Modern Data Architectures

Q&A with Greg Rahn – The changing Data Warehouse market

Building A Data Lake For The Database Administrator At Upsolver

Three Trends for Modernizing Analytics and Data Warehousing in 2019

Cloudera + Hortonworks, from the Edge to AI

Pig Interview Questions and Answers for 2025

DATE_TRUNC SQL function: Why we love it

Data Virtualization: Process, Components, Benefits, and Available Tools

A New Era in Data Warehousing

Optimizing EC2 costs on Databricks

2020 Data Impact Award Winner Spotlight: Telkomsel

EXTRACT SQL function: Why we love it

How To Succeed As a DataOps Engineer

Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query

Unlocking Cloud Insights: A Comprehensive Guide to AWS Data Analytics

Governing for digital transformation and growth

What are the Pre-requisites to learn Hadoop?

Top 8 Data Engineering Books [Beginners to Advanced]

What is Operational Analytics?

Understanding the components of the dbt Semantic Layer

Joining the Astronomer team

Top Data Lake Vendors (Quick Reference Guide)

Turning Streams Into Data Products

How to Become a Data Engineer in 2024?

2017 – Another Award-Winning Year for Cloudera!

Nominations Now Open for the Sixth Annual Cloudera Data Impact Awards

Ramp Simplifies Data Architecture Management, Cuts Costs, and Delivers Market Insights to Customers at Scale

Now Available: Cloudera Data Science Workbench Release 1.4

The Evolution of Table Formats

The Future of Data Engineering and Data Engineers

The Art of Master Data Management at Picnic

Introducing Blended Learning From Cloudera University

The Airflow Smart Sensor Service

And the winners are…. Congratulations to the Sixth Annual Data Impact Awards winners

Stay Connected