Architecture, BI and Data Warehouse - Data Engineering Digest

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

The goal of this post is to understand how data integrity best practices have been embraced time and time again, no matter the technology underpinning. In the beginning, there was a data warehouse The data warehouse (DW) was an approach to data architecture and structured data management that really hit its stride in the early 1990s.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Data News — Week 25.02

Christophe Blefari

JANUARY 11, 2025

Designing a declarative data stack: from theory to practice — Related to the previous article Simon wrote a great article about the things to have in mind when we build a proprietary DSL for a declarative data stack. Meaning: a YAML configuration system for ingestion and transformations, and now, visualisation with BI-as-code.

Data

Data Data Warehouse Coding Programming Language

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

The company wants to combine its sales, inventory, and customer data in order to facilitate real-time reporting and predictive analytics. Azure, Power BI, and Microsoft 365 are already widely used by ShopSmart, which is in line with Fabric’s integrated ecosystem. Next, we will see what Snowflake is What is Snowflake?

BI

BI Pipeline-centric Data Lake Google Cloud

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

The promise of a modern data lakehouse architecture. Imagine having self-service access to all business data, anywhere it may be, and being able to explore it all at once. Imagine quickly answering burning business questions nearly instantly, without waiting for data to be found, shared, and ingested.

Architecture

Architecture Metadata Machine Learning Unstructured Data

Do Away With Data Integration Through A Dataware Architecture With Cinchy

Data Engineering Podcast

AUGUST 27, 2021

By making the software be the owner of the data that it generates, we have to go through the trouble of extracting the information to then be used elsewhere. The team at Cinchy are working to bring about a new paradigm of software architecture that puts the data as the central element. No more scripts, just SQL.

Data Integration

Data Integration Architecture Data Warehouse Data Lake

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Data Engineering Podcast

JULY 3, 2022

Summary The ecosystem for data tools has been going through rapid and constant evolution over the past several years. These technological shifts have brought about corresponding changes in data and platform architectures for managing data and analytical workflows. Tired of deploying bad data?

Architecture

Architecture Metadata MongoDB MySQL

Data Engineering Weekly #206

Data Engineering Weekly

FEBRUARY 2, 2025

[link] Adam Bellemare & Thomas Betts: The End of the Bronze Age: Rethinking the Medallion Architecture I’m always a bit uncomfortable with medallion architecture since it is a glorified term for the traditional ETL process. link] All rights reserved ProtoGrowth Inc, India.

Data Engineer

Data Engineer Data Engineering Engineering Data Lake

Is Modern Data Warehouse Architecture Broken?

Monte Carlo

APRIL 16, 2022

The data warehouse is the foundation of the modern data stack, so it caught our attention when we saw Convoy head of data Chad Sanderson declare, “ the data warehouse is broken ” on LinkedIn. Treating data like an API. Immutable data warehouses have challenges too.

Data Warehouse

Data Warehouse Architecture Data Data Architect

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Data Engineering Podcast

SEPTEMBER 3, 2023

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. Materialize]([link] You shouldn't have to throw away the database to build with fast-changing data.

Data Integration

Data Integration BI SQL Python

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Modern data architectures. To eliminate or integrate these silos, the public sector needs to adopt robust data management solutions that support modern data architectures (MDAs). Deploying modern data architectures. Lack of sharing hinders the elimination of fraud, waste, and abuse. Forrester ).

Data Architecture

Data Architecture Architecture Data Lake NoSQL

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Cloudera

JANUARY 11, 2021

They often don’t realize that infrastructure for BI must be scalable and shared with external partners who need to collaborate on projects. . How self-service data warehousing frees IT resources. Cloudera Data Warehouse (CDW) is a cloud service and an integral part of the newly released Cloudera Data Platform (CDP).

Data Warehouse

Data Warehouse Pharmaceutical Data Lake BI

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Cloudera

FEBRUARY 9, 2021

Today’s customers have a growing need for a faster end to end data ingestion to meet the expected speed of insights and overall business demand. This ‘need for speed’ drives a rethink on building a more modern data warehouse solution, one that balances speed with platform cost management, performance, and reliability.

Data Warehouse

Data Warehouse Cloud Kafka Cloud Storage

Data News — Week 23.21

Christophe Blefari

MAY 29, 2023

The Future of Data — Everyone wants a piece of the pie; no one wants to bake. Data Modeling, architecture Pattern, tools and the future — part 3 of Simon's guide. Here are first impressions , how it includes with Power BI and a few remarks. Writing design docs for data pipelines.

BI

BI Data Warehouse Data Data Pipeline

Snowflake Migration Success Stories: Core Digital Media and NAVEX

Snowflake

OCTOBER 16, 2024

Many of our customers — from Marriott to AT&T — start their journey with the Snowflake AI Data Cloud by migrating their data warehousing workloads to the platform. Today we’re focusing on customers who migrated from a legacy data warehouse to Snowflake and some of the benefits they saw.

Digital Media

Digital Media Media Data Lake Data Warehouse

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake.

Data Lake

Data Lake Data Warehouse BI SQL

Fast And Flexible Headless Data Analytics With Cube.JS

Data Engineering Podcast

DECEMBER 21, 2021

Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. What are the main use cases and platform architectures that you are focused on?

Data Analytics

Data Analytics BI Computer Science SQL

Building Applications With Data As Code On The DataOS

Data Engineering Podcast

JANUARY 15, 2023

Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Start trusting your data with Monte Carlo today! Start trusting your data with Monte Carlo today!

Coding

Coding Building PostgreSQL Data Lake

Data News — Week 23.05

Christophe Blefari

FEBRUARY 3, 2023

The idea is to create a unified layer that stores all the data needed to take decisions. Lyft, powering millions of real-time decisions with LyftLearn Serving — Architecture of the decentralized system Lyft use to deploy and serve ml models. The article is a good summary of the required blocks composing a modern data stack.

BI

BI Google Cloud Machine Learning SQL

Taking A Tour Of The Google Cloud Platform For Data And Analytics

Data Engineering Podcast

JUNE 11, 2021

Summary Google pioneered an impressive number of the architectural underpinnings of the broader big data ecosystem. In this episode Lak Lakshmanan enumerates the variety of services that are available for building your various data processing and analytical systems. No more scripts, just SQL.

Google Cloud

Google Cloud Cloud Big Data Ecosystem Data Warehouse

Data Lake vs. Data Warehouse: Differences and Similarities

U-Next

SEPTEMBER 7, 2022

The terms “ Data Warehouse ” and “ Data Lake ” may have confused you, and you have some questions. On the other hand, a data warehouse contains historical data that has been cleaned and arranged. . What is Data Warehouse? . Data Warehouse in DBMS: .

Data Lake

Data Lake Data Warehouse Unstructured Data Amazon Web Services

Revisiting The Technical And Social Benefits Of The Data Mesh

Data Engineering Podcast

DECEMBER 26, 2021

Summary The data mesh is a thesis that was presented to address the technical and organizational challenges that businesses face in managing their analytical workflows at scale. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows.

BI

BI Data Warehouse Data Engineer Data Engineering

Data Warehouse vs Big Data

Knowledge Hut

APRIL 23, 2024

Two popular approaches that have emerged in recent years are data warehouse and big data. While both deal with large datasets, but when it comes to data warehouse vs big data, they have different focuses and offer distinct advantages.

Data Warehouse

Data Warehouse Big Data Unstructured Data Hadoop

Data Warehouse Migration Best Practices

Monte Carlo

FEBRUARY 6, 2023

So, you’re planning a cloud data warehouse migration. But be warned, a warehouse migration isn’t for the faint of heart. As you probably already know if you’re reading this, a data warehouse migration is the process of moving data from one warehouse to another. A worthy quest to be sure.

Data Warehouse

Data Warehouse AWS Data Data Validation

Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - Episode 51

Data Engineering Podcast

OCTOBER 9, 2018

Summary One of the most complex aspects of managing data for analytical workloads is moving it from a transactional database into the data warehouse. What if you didn’t have to do that at all? If you think that sounds awesome (and it is) then join the free webinar with Metis Machine on October 11th at 2 PM ET (11 AM PT).

PostgreSQL

PostgreSQL BI Machine Learning Data Warehouse

Data Council 2023

Christophe Blefari

MAY 18, 2023

The talk starts with a great introduction of Snowflake architecture. In a nutshell the speakers share tips about warehouses sizing and design, performance optimisation with pruning, clustering and query design. Retro on data science by DJ Patil — DJ Patil has been US Chief Data Scientist. He does a great retro.

Data

Data BI Consulting Data Science

An Overview of Real Time Data Warehousing on Cloudera

Cloudera

NOVEMBER 2, 2020

Users today are asking ever more from their data warehouse. As an example of this, in this post we look at Real Time Data Warehousing (RTDW), which is a category of use cases customers are building on Cloudera and which is becoming more and more common amongst our customers. What is Real Time Data Warehousing?

Data Warehouse

Data Warehouse Kafka Lambda Architecture Telecommunication

What is Business Intelligence(BI)?

Knowledge Hut

JANUARY 19, 2024

Moreover, you can make significant business decisions by exploring the data you already have. The process of gathering, storing, mining, and analyzing data comes under business intelligence. Under BI, all the data a company generates gets stored and used to make significant business growth decisions and multiply the revenue.

Business Intelligence

Business Intelligence BI Raw Data Data Warehouse

Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk

Data Engineering Podcast

JUNE 17, 2021

In this episode he shares the goals of the Unstruk Data Warehouse, how it is architected to extract asset metadata and build a searchable knowledge graph from the information, and the myriad ways that the system can be used. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads?

Unstructured Data

Unstructured Data Data Warehouse Metadata Media

Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform

Data Engineering Podcast

DECEMBER 11, 2021

In order to make it easier for developers to build customer profiles in a way that respects their privacy Serge Huber helped to create the Apache Unomi framework as an open source customer data platform. Start trusting your data with Monte Carlo today! The data you’re looking for is already in your data warehouse and BI tools.

Data Warehouse

Data Warehouse Raw Data Data Lake BI

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

While cloud-native, point-solution data warehouse services may serve your immediate business needs, there are dangers to the corporation as a whole when you do your own IT this way. Cloudera Data Warehouse (CDW) is here to save the day! CDW is an integrated data warehouse service within Cloudera Data Platform (CDP).

IT

IT Data Lake Data Warehouse Cloud Storage

Set Up Your Own Data-as-a-Service Platform On Dremio with Tomer Shiran - Episode 58

Data Engineering Podcast

NOVEMBER 25, 2018

Summary When your data lives in multiple locations, belonging to at least as many applications, it is exceedingly difficult to ask complex questions of it. The default way to manage this situation is by crafting pipelines that will extract the data from source systems and load it into a data lake or data warehouse.

Data Lake

Data Lake Data Warehouse Hadoop BI

Data Lakehouse: Concept, Key Features, and Architecture Layers

AltexSoft

NOVEMBER 10, 2021

The pun being obvious, there’s more to that than just a new term: Data lakehouses combine the best features of both data lakes and data warehouses and this post will explain this all. What is a data lakehouse? Data warehouse vs data lake vs data lakehouse: What’s the difference.

Architecture

Architecture Data Lake Data Warehouse Metadata

Data Lakehouse Architecture Explained: 5 Layers

Monte Carlo

JANUARY 5, 2024

You know what they always say: data lakehouse architecture is like an onion. …ok, Data lakehouse architecture combines the benefits of data warehouses and data lakes, bringing together the structure and performance of a data warehouse with the flexibility of a data lake.

Architecture

Architecture Data Lake Metadata Unstructured Data

Let Your Analysts Build A Data Lakehouse With Cuelake

Data Engineering Podcast

AUGUST 20, 2021

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. The data you’re looking for is already in your data warehouse and BI tools.

Building

Building Data Lake Data Warehouse SQL

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Podcast

OCTOBER 15, 2021

Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools.

Metadata

Metadata BI Data Warehouse Government

Self Service Business Intelligence And Data Sharing Using Looker with Daniel Mintz - Episode 55

Data Engineering Podcast

NOVEMBER 4, 2018

How does it compare to open source platforms for BI? Given that you are connecting to the customer’s data store, how do you ensure sufficient security? data engineer vs sales management) What are the scaling factors for Looker, both in terms of volume of data for reporting from, and for user concurrency?

Business Intelligence

Business Intelligence Hadoop BI Data Warehouse

Power BI Developer Roles and Responsibilities [2023 Updated]

Knowledge Hut

OCTOBER 30, 2023

Ever wondered why Power BI developers are widely sought after by businesses all around the world? For any organization to grow, it requires business intelligence reports and data to offer insights to aid in decision-making. This data and reports are generated and developed by Power BI developers.

BI

BI Business Intelligence Data Cleanse Business Analyst

5 Layers of Data Lakehouse Architecture Explained

Monte Carlo

JANUARY 5, 2024

You know what they always say: data lakehouse architecture is like an onion. …ok, Data lakehouse architecture combines the benefits of data warehouses and data lakes, bringing together the structure and performance of a data warehouse with the flexibility of a data lake.

Architecture

Architecture Data Lake Metadata Unstructured Data

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

In this post, we will help you quickly level up your overall knowledge of data pipeline architecture by reviewing: Table of Contents What is data pipeline architecture? Why is data pipeline architecture important? What is data pipeline architecture? Why is data pipeline architecture important?

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Leave Your Data Where It Is And Automate Feature Extraction With Molecula

Data Engineering Podcast

MARCH 8, 2021

Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. RudderStack’s smart customer data pipeline is warehouse-first.

IT

IT Data Warehouse MongoDB Kafka

A Serverless Query Engine from Spare Parts

Towards Data Science

APRIL 26, 2023

Plus, we will put together a design that minimizes costs compared to modern data warehouses, such as Big Query or Snowflake. As data practitioners we want (and love) to build applications on top of our data as seamlessly as possible. The idea is to start from a Data Lake where our data are stored.

Engineering

Engineering Data Lake AWS BI

Building Real-Time Data Platforms For Large Volumes Of Information With Aerospike

Data Engineering Podcast

OCTOBER 2, 2021

Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. What are the driving factors for building a real-time data platform?

Building

Building BI Data Architecture Architecture

Snowflake Architecture and It's Fundamental Concepts

ProjectPro

JANUARY 31, 2022

As the demand for big data grows, an increasing number of businesses are turning to cloud data warehouses. The cloud is the only platform to handle today's colossal data volumes because of its flexibility and scalability. Launched in 2014, Snowflake is one of the most popular cloud data solutions on the market.

Architecture

Architecture IT Data Warehouse Amazon Web Services

Accelerating ML Training And Delivery With In-Database Machine Learning

Data Engineering Podcast

JUNE 14, 2021

RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. RudderStack’s smart customer data pipeline is warehouse-first.

Machine Learning

Machine Learning Database Data Warehouse Hadoop

Data Integrity for AI: What’s Old is New Again

Data News — Week 25.02

Webinars

Trending Sources

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Webinars

The Modern Data Lakehouse: An Architectural Innovation

Do Away With Data Integration Through A Dataware Architecture With Cinchy

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Data Engineering Weekly #206

Is Modern Data Warehouse Architecture Broken?

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Breaking State and Local Data Silos with Modern Data Architectures

Enabling Self-Service Business Insights with Cloudera Data Warehouse

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Data News — Week 23.21

Snowflake Migration Success Stories: Core Digital Media and NAVEX

The Future of the Data Lakehouse – Open

Fast And Flexible Headless Data Analytics With Cube.JS

Building Applications With Data As Code On The DataOS

Data News — Week 23.05

Taking A Tour Of The Google Cloud Platform For Data And Analytics

Data Lake vs. Data Warehouse: Differences and Similarities

Revisiting The Technical And Social Benefits Of The Data Mesh

Data Warehouse vs Big Data

Data Warehouse Migration Best Practices

Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - Episode 51

Data Council 2023

An Overview of Real Time Data Warehousing on Cloudera

What is Business Intelligence(BI)?

Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk

Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Set Up Your Own Data-as-a-Service Platform On Dremio with Tomer Shiran - Episode 58

Data Lakehouse: Concept, Key Features, and Architecture Layers

Data Lakehouse Architecture Explained: 5 Layers

Let Your Analysts Build A Data Lakehouse With Cuelake

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Self Service Business Intelligence And Data Sharing Using Looker with Daniel Mintz - Episode 55

Power BI Developer Roles and Responsibilities [2023 Updated]

5 Layers of Data Lakehouse Architecture Explained

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Leave Your Data Where It Is And Automate Feature Extraction With Molecula

A Serverless Query Engine from Spare Parts

Building Real-Time Data Platforms For Large Volumes Of Information With Aerospike

Snowflake Architecture and It's Fundamental Concepts

Accelerating ML Training And Delivery With In-Database Machine Learning

Stay Connected