Data Warehouse and SQL - Data Engineering Digest

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Many of our customers — from Marriott to AT&T — start their journey with the Snowflake AI Data Cloud by migrating their data warehousing workloads to the platform. Today we’re focusing on customers who migrated from a cloud data warehouse to Snowflake and some of the benefits they saw.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

Simplify Data Warehouse Migrations: Free SnowConvert

Snowflake

JANUARY 28, 2025

Migrating from a traditional data warehouse to a cloud data platform is often complex, resource-intensive and costly. As part of this announcement, Snowflake is also announcing private preview support of a new end-to-end data migration experience for Amazon Redshift.

Data Warehouse

Data Warehouse Professional Services SQL Data

Simplify Data Warehouse Migrations: Free SnowConvert with Redshift Support

Snowflake

JANUARY 28, 2025

Migrating from a traditional data warehouse to a cloud data platform is often complex, resource-intensive and costly. As part of this announcement, Snowflake is also announcing private preview support of a new end-to-end data migration experience for Amazon Redshift.

Data Warehouse

Data Warehouse Professional Services SQL Coding

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How to Normalize Relational Databases With SQL Code?

Analytics Vidhya

FEBRUARY 27, 2023

So, we are […] The post How to Normalize Relational Databases With SQL Code? If a corrupted, unorganized, or redundant database is used, the results of the analysis may become inconsistent and highly misleading. appeared first on Analytics Vidhya.

Relational Database

Relational Database Database SQL Coding

Accelerate Offloading to Cloudera Data Warehouse (CDW) with Procedural SQL Support

Cloudera

JULY 16, 2021

Did you know Cloudera customers, such as SMG and Geisinger , offloaded their legacy DW environment to Cloudera Data Warehouse (CDW) to take advantage of CDW’s modern architecture and best-in-class performance? Today, we are pleased to announce the general availability of HPL/SQL integration in CDW public cloud.

Data Warehouse

Data Warehouse SQL PostgreSQL Database

How to get started with dbt

Christophe Blefari

MARCH 1, 2023

dbt Core is an open-source framework that helps you organise data warehouse SQL transformation. dbt was born out of the analysis that more and more companies were switching from on-premise Hadoop data infrastructure to cloud data warehouses. This switch has been lead by modern data stack vision.

Data Warehouse

Data Warehouse SQL Metadata Raw Data

Introducing the New SQL Editor

databricks

OCTOBER 14, 2024

Over the last few years, we've seen tremendous growth and adoption of Databricks SQL , our intelligent data warehouse purpose-built on the Data.

SQL

SQL Data Warehouse Data

Mirroring SQL Server Database to Microsoft Fabric

Striim

NOVEMBER 19, 2024

SQL2Fabric Mirroring is a new fully managed service offered by Striim to mirror on premise SQL Databases. It’s a collaborative service between Striim and Microsoft based on Fabric Open Mirroring that enables real-time data replication from on-premise SQL Server databases to Azure Fabric OneLake.

SQL

SQL Database Data Warehouse Data Pipeline

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

These stages propagate through various systems including function-based systems that load, process, and propagate data through stacks of function calls in different programming languages (e.g., For simplicity, we will demonstrate these for the web, the data warehouse, and AI, per the diagram below. Hack, C++, Python, etc.)

Data Warehouse

Data Warehouse SQL Programming Language Data

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.

SQL

SQL Data Lake High Quality Data Machine Learning

Data Warehouse Schemas: Meet the Big 3 Everyone’s Using

Monte Carlo

FEBRUARY 11, 2025

Think of your data warehouse like a well-organized library. Thats where data warehouse schemas come in. A data warehouse schema is a blueprint for how your data is structured and linkedusually with fact tables (for measurable data) and dimension tables (for descriptive attributes).

Data Warehouse

Data Warehouse Electronics Retail Data

Databricks SQL Year in Review (Part II): SQL Programming Features

databricks

JANUARY 31, 2024

Welcome to the blog series covering product advancements in 2023 for Databricks SQL, the serverless data warehouse from Databricks. This is part 2.

SQL

SQL Programming Data Warehouse Data

How to unit test sql transforms in dbt

Start Data Engineering

JANUARY 16, 2021

Introduction Setup Code Conditional logic to read from mock input Custom macro to test for equality Setup environment specific test Run ELT using dbt Conclusion Further reading Introduction With the recent advancements in data warehouses and tools like dbt most transformations(T of ELT) are being done directly in the data warehouse.

SQL

SQL Data Warehouse Coding IT

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

JUNE 25, 2023

You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again.

Data Engineering

Data Engineering Data Engineer Python Engineering

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Data News — Week 25.02

Christophe Blefari

JANUARY 11, 2025

He listed 4 things that are the most difficult data integration tasks: from mutable data to IT migrations, everything adds complexity to ingestion systems. The software development lifecycle within a modern data engineering framework — A great deep-dive about a data platform using dltHub, dbt and Dagster.

Data

Data Data Warehouse Coding Programming Language

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free! Your first 30 days are free!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Using SQL to democratize streaming data

Cloudera

MARCH 2, 2021

Contrast that with the skills honed over decades for gaining access, building data warehouses, performing ETL, creating reports and/or applications using structured query language (SQL). The declarative nature of the SQL language makes it a powerful paradigm for getting data to the people who need it.

SQL

SQL Java Data Lake Scala

My (Very) Personal Data Warehouse

Towards Data Science

JUNE 6, 2023

It runs locally, has extensive SQL support and can run queries directly on Pandas data, Parquet, JSON data. The fact it’s insanely fast and does (mostly) all processing in memory make it a good choice for building my personal data warehouse. Extra points for its seamless integration with Python and R.

Data Warehouse

Data Warehouse Data SQL Data Analysis

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

With yato you give a folder with SQL queries and it guesses the DAG and runs the queries in the right order. BigQuery supports DELETE to delete partitions in a SQL query. Arrow doing a lot of the data operation heavy lifting. Give a lot of insights on the market. this is more common sense but always works.

Metadata

Metadata Data Data Warehouse Software Engineer

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

Data Engineering Podcast

JANUARY 1, 2022

In this episode Emily Riederer shares her work to create a controlled vocabulary for managing the semantic elements of the data managed by her team and encoding it in the schema definitions in her data warehouse. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams.

Data Warehouse

Data Warehouse BI Data Workflow Data Engineering

What’s new with Databricks SQL?

databricks

AUGUST 10, 2023

At this year's Data+AI Summit, Databricks SQL continued to push the boundaries of what a data warehouse can be, leveraging AI across the.

SQL

SQL Data Warehouse Data

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

Data Engineering Podcast

MAY 1, 2022

Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. No more shipping and praying, you can now know exactly what will change in your database!

Data Warehouse

Data Warehouse Data Integration Cloud Google Cloud

Streaming Data Pipelines Made SQL With Decodable

Data Engineering Podcast

OCTOBER 28, 2021

He also explains why he started Decodable to address that limitation and the work that he and his team have done to let data engineers build streaming pipelines entirely in SQL. Start trusting your data with Monte Carlo today! Hightouch is the easiest way to sync data into the platforms that your business teams rely on.

Data Pipeline

Data Pipeline SQL Data Warehouse Data Lake

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

Snowflake was founded in 2012 around its data warehouse product, which is still its core offering, and Databricks was founded in 2013 from academia with Spark co-creator researchers, becoming Apache Spark in 2014. you could write the same pipeline in Java, in Scala, in Python, in SQL, etc.—with 3) Spark 4.0

Metadata

Metadata Data Warehouse BI MySQL

How To Migrate Your Oracle PL/SQL Code to Databricks Lakehouse Platform

databricks

FEBRUARY 12, 2023

Oracle is a well-known technology for hosting Enterprise Data Warehouse solutions. However, many customers like Optum and the U.S. Citizenship and Immigration Services.

Coding

Coding SQL Data Warehouse Technology

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

In this blog, we will share with you in detail how Cloudera integrates core compute engines including Apache Hive and Apache Impala in Cloudera Data Warehouse with Iceberg. We will publish follow up blogs for other data services. Try Cloudera Data Warehouse (CDW) by signing up for a 60 day trial , or test drive CDP.

Data Warehouse

Data Warehouse Java Metadata Data

Stop Overcomplicating Data Quality

Towards Data Science

DECEMBER 10, 2024

Take advantage of old school databasetricks In the last 1015 years weve seen massive changes to the data industry, notably big data, parallel processing, cloud computing, data warehouses, and new tools (lots and lots of newtools). Created by the author using SQL-WatchPup Thats it. No server hosting costs.

PostgreSQL

PostgreSQL Data Python SQL

Why Data Analysts And Engineers Make Great Consultants

Seattle Data Guy

MAY 26, 2024

Many data engineers and analysts don’t realize how valuable the knowledge they have is. They’ve spent hours upon hours learning SQL, Python, how to properly analyze data, build data warehouses, and understand the differences between eight different ETL solutions.

Consulting

Consulting Engineering Data Warehouse SQL

Understanding Caching in Databricks SQL: UI, Result, and Disk Caches

databricks

MAY 3, 2023

Caching is an essential technique for improving the performance of data warehouse systems by avoiding the need to recompute or fetch the same.

SQL

SQL Data Warehouse Systems Data

Reflecting On The Past 6 Years Of Data Engineering

Data Engineering Podcast

FEBRUARY 5, 2023

Materialize’s PostgreSQL-compatible interface lets users leverage the tools they already use, with unsurpassed simplicity enabled by full ANSI SQL support. Sign up now for early access to Materialize and get started with the power of streaming data with the same simplicity and low implementation cost as batch cloud data warehouses.

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

link] Jon Osborn: Best Practices for Using QUERY_TAG in Snowflake The modern data warehouses are good at running at scale, given the cost is not a constraint. link] JBarti: Write Manageable Queries With The BigQuery Pipe Syntax Our quest to simplify SQL is always an adventure.

Data Engineering

Data Engineering Data Engineer Engineering Insurance

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Data Engineering Podcast

SEPTEMBER 3, 2023

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Data Integration

Data Integration BI SQL Python

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

Data volume and velocity, governance, structure, and regulatory requirements have all evolved and continue to. Despite these limitations, data warehouses, introduced in the late 1980s based on ideas developed even earlier, remain in widespread use today for certain business intelligence and data analysis applications.

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

Best Practices for Real-Time Stream Processing

Striim

MARCH 21, 2025

Batch processing: data is typically extracted from databases at the end of the day, saved to disk for transformation, and then loaded in batch to a data warehouse. Batch data integration is useful for data that isn’t extremely time-sensitive. Electric bills are a relevant example.

Process

Process Data Warehouse Kafka Data Pipeline

5 Advantages of Real-Time ETL for Snowflake

Striim

MARCH 21, 2025

With instant elasticity, high-performance, and secure data sharing across multiple clouds , Snowflake has become highly in-demand for its cloud-based data warehouse offering. As organizations adopt Snowflake for business-critical workloads, they also need to look for a modern data integration approach.

Data Warehouse

Data Warehouse MongoDB MySQL Hadoop

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Cloudera

JANUARY 19, 2024

As described in our recent blog post , an SQL AI Assistant has been integrated into Hue with the capability to leverage the power of large language models (LLMs) for a number of SQL tasks. This is a real game-changer for data analysts on all levels and will make SQL development faster, easier, and less error-prone.

SQL

SQL Data Warehouse AWS Accessibility

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

Data Engineering Podcast

JANUARY 8, 2023

In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface. Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Missing data?

PostgreSQL

PostgreSQL Data Lake Data Warehouse BI

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Data Warehousing Essentials: A Guide To Data Warehousing

Seattle Data Guy

FEBRUARY 10, 2024

Photo by Tiger Lily Data warehouses and data lakes play a crucial role for many businesses. It gives businesses access to the data from all of their various systems. As well as often integrating data so that end-users can answer business critical questions.

Data Lake

Data Lake Data Warehouse Data Accessibility

Data News — Week 23.37

Christophe Blefari

SEPTEMBER 15, 2023

How to reduce warehouse costs? — Hugo propose 7 hacks to optimise data warehouse cost. Dimensional data modeling with dbt — A great 6-steps process to create a simple dim-fact model with dbt. teej/titan — Titan is a Python library to manage data warehouse infrastructure. Crazy amounts.

Data Warehouse

Data Warehouse Data SQL Python

Building Applications With Data As Code On The DataOS

Data Engineering Podcast

JANUARY 15, 2023

With a PostgreSQL-compatible interface, you can now work with real-time data using ANSI SQL including the ability to perform multi-way complex joins, which support stream-to-stream, stream-to-table, table-to-table, and more, all in standard SQL.

Coding

Coding Building PostgreSQL Data Lake

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Data Engineering Podcast

DECEMBER 28, 2022

Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. Pricing for SQLake is simple.

Data Lake

Data Lake Data Warehouse Data Pipeline MongoDB

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Simplify Data Warehouse Migrations: Free SnowConvert

Webinars

Trending Sources

Simplify Data Warehouse Migrations: Free SnowConvert with Redshift Support

Webinars

How to Normalize Relational Databases With SQL Code?

Accelerate Offloading to Cloudera Data Warehouse (CDW) with Procedural SQL Support

Top 5 SQL Interview Questions With Implementation

How to get started with dbt

Introducing the New SQL Editor

Mirroring SQL Server Database to Microsoft Fabric

How Meta discovers data flows via lineage at scale

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Warehouse Schemas: Meet the Big 3 Everyone’s Using

Databricks SQL Year in Review (Part II): SQL Programming Features

How to unit test sql transforms in dbt

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Data News — Week 25.02

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Using SQL to democratize streaming data

My (Very) Personal Data Warehouse

Data News — Week 24.11

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

What’s new with Databricks SQL?

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

Streaming Data Pipelines Made SQL With Decodable

Databricks, Snowflake and the future

How To Migrate Your Oracle PL/SQL Code to Databricks Lakehouse Platform

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Stop Overcomplicating Data Quality

Why Data Analysts And Engineers Make Great Consultants

Understanding Caching in Databricks SQL: UI, Result, and Disk Caches

Reflecting On The Past 6 Years Of Data Engineering

Data Engineering Weekly #198

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Data Lake vs. Data Warehouse vs. Data Lakehouse

Best Practices for Real-Time Stream Processing

5 Advantages of Real-Time ETL for Snowflake

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

Modern Customer Data Platform Principles

Data Warehousing Essentials: A Guide To Data Warehousing

Data News — Week 23.37

Building Applications With Data As Code On The DataOS

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Stay Connected