Data Architecture and SQL - Data Engineering Digest

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Whether it’s unifying transactional and analytical data with Hybrid Tables, improving governance for an open lakehouse with Snowflake Open Catalog or enhancing threat detection and monitoring with Snowflake Horizon Catalog , Snowflake is reducing the number of moving parts to give customers a fully managed service that just works.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Data Engineering Podcast

SEPTEMBER 7, 2020

Your host is Tobias Macey and today I’m interviewing Martin Traverso about PrestoSQL, a distributed SQL engine that queries data in place Interview Introduction How did you get involved in the area of data management? Can you start by giving an overview of what Presto is and its origin story?

Architecture

Architecture Data Architecture SQL Engineering

A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore

Data Engineering Podcast

MAY 29, 2022

Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries.

Database

Database Architecture Data Architecture PostgreSQL

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Cloudera

SEPTEMBER 29, 2022

Each of these trends claim to be complete models for their data architectures to solve the “everything everywhere all at once” problem. Data teams are confused as to whether they should get on the bandwagon of just one of these trends or pick a combination. First, we describe how data mesh and data fabric could be related.

Architecture

Architecture Data Architecture Metadata Data Warehouse

5 Advantages of Real-Time ETL for Snowflake

Striim

MARCH 21, 2025

In addition to log files, sensors, and messaging systems, Striim continuously ingests real-time data from cloud-based or on-premises data warehouses and databases such as Oracle, Oracle Exadata, Teradata, Netezza, Amazon Redshift, SQL Server, HPE NonStop, MongoDB, and MySQL.

Data Warehouse

Data Warehouse MongoDB MySQL Hadoop

Snowflake To Acquire Ponder, Boosting Python Capabilities In the Data Cloud

Snowflake

OCTOBER 23, 2023

Snowflake customers are already harnessing the power of Python through Snowpark , a set of runtimes and libraries that securely deploy and process non-SQL code directly in Snowflake. Every day, we witness approximately 20 million Snowpark queries² driving a spectrum of data engineering and data science tasks, with Python leading the way.

Python

Python Cloud Data Science Machine Learning

8 Takeaways from Snowflake’s Accelerate Events for Retail, CPG and Media

Snowflake

APRIL 10, 2025

One of the great things about Snowflakes data clean rooms is that theyre transparent and flexible, said Joe Zucker, Senior Manager, Marketing Analytics, at Indeed.com. You can write a query in SQL and everybody who is working on a project can see it. Missed the events?

Media

Media Retail Entertainment Consulting

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists.

Metadata

Metadata Datasets BI SQL

Striim Achieves Google Cloud Ready — Cloud SQL Designation

Striim

AUGUST 29, 2023

We are proud to announce that Striim has successfully achieved Google Cloud Ready – Cloud SQL Designation for Google Cloud’s fully managed relational database service for MySQL, PostgreSQL, and SQL Server.

Google Cloud

Google Cloud SQL Cloud Designing

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

Key Differences Between AI Data Engineers and Traditional Data Engineers While traditional data engineers and AI data engineers have similar responsibilities, they ultimately differ in where they focus their efforts.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

A Complete Guide to Scale Your Data Pipelines and Data Products with Contract Testing and Dbt

Towards Data Science

OCTOBER 25, 2023

Not too long ago, almost all data architectures and data team structures followed a centralized approach. As a data or analytics engineer, you knew where to find all the transformation logic and models because they were all in the same codebase. Your organization may be undergoing the decentralization of data.

Data Pipeline

Data Pipeline SQL Data Architecture Data

How to Become a Microsoft Fabric Engineer?

Edureka

APRIL 9, 2025

Development of Some Relevant Skills and Knowledge Data Engineering Fundamentals: Theoretical knowledge of data loading patterns, data architectures, and orchestration processes. Data Analytics: Capability to effectively use tools and techniques for analyzing data and drawing insights.

Engineering

Engineering Data Ingestion Data Lake Programming Language

Let Your Analysts Build A Data Lakehouse With Cuelake

Data Engineering Podcast

AUGUST 20, 2021

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. The data you’re looking for is already in your data warehouse and BI tools.

Building

Building Data Lake Data Warehouse SQL

SQL for Data Engineering: Success Blueprint for Data Engineers

ProjectPro

FEBRUARY 16, 2023

At the heart of these data engineering skills lies SQL that helps data engineers manage and manipulate large amounts of data. Did you know SQL is the top skill listed in 73.4% of data engineer job postings on Indeed? Almost all major tech organizations use SQL. use SQL, compared to 61.7%

Data Engineering

Data Engineering Data Engineer SQL Engineering

Join us at the Iceberg Summit 2024

Cloudera

MAY 10, 2024

Iceberg, a high-performance open-source format for huge analytic tables, delivers the reliability and simplicity of SQL tables to big data while allowing for multiple engines like Spark, Flink, Trino, Presto, Hive, and Impala to work with the same tables, all at the same time.

Government

Government Data Governance Big Data Data Pipeline

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Confluent

JULY 17, 2019

Together, MongoDB and Apache Kafka ® make up the heart of many modern data architectures today. This API enables users to leverage ready-to-use components that can stream data from external systems into Kafka topics, as well as stream data from Kafka topics into external systems.

MongoDB

MongoDB Kafka Database Medical

Ramp Simplifies Data Architecture Management, Cuts Costs, and Delivers Market Insights to Customers at Scale

Snowflake

JANUARY 30, 2023

And for the few small issues we had with configuration or SQL differences, I was able to bounce ideas around with Snowflake’s startup team. Ramp transforms performance and overcomes data processing challenges With a new system that can scale with its growing datasets, du Toit and his team have been able to transform performance.

Data Architecture

Data Architecture Architecture Management Datasets

Accelerate Development Of Enterprise Analytics With The Coalesce Visual Workflow Builder

Data Engineering Podcast

APRIL 3, 2022

Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. No more shipping and praying, you can now know exactly what will change in your database!

Data Warehouse

Data Warehouse Data Workflow Data Architecture SQL

Fast Analytics On Semi-Structured And Structured Data In The Cloud

Data Engineering Podcast

OCTOBER 7, 2019

Summary The process of exposing your data through a SQL interface has many possible pathways, each with their own complications and tradeoffs. One of the recent options is Rockset, a serverless platform for fast SQL analytics on semi-structured and structured data. Visit Datacoral.com today to find out more.

Structured Data

Structured Data Cloud SQL Programming Language

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

And, since historically tools and commercial platforms were often designed to align with one specific architecture pattern, organizations struggled to adapt to changing business needs – which of course has implications on data architecture.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Data Engineering Podcast

SEPTEMBER 1, 2021

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. Can you describe what Ahana is and the story behind it?

Data Lake

Data Lake Cloud AWS SQL

Building Real-Time Data Platforms For Large Volumes Of Information With Aerospike

Data Engineering Podcast

OCTOBER 2, 2021

Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. What are the driving factors for building a real-time data platform? How is Aerospike being incorporated in application and data architectures? Can you describe how the Aerospike engine is architected?

Building

Building BI Data Architecture Architecture

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

This specialist works closely with people on both business and IT sides of a company to understand the current needs of the stakeholders and help them unlock the full potential of data. To get a better understanding of a data architect’s role, let’s clear up what data architecture is.

Data Architect

Data Architect Certification Generalist Big Data

Just Launched: Dremio SQL Query Engine Data Quality Monitoring

Monte Carlo

AUGUST 30, 2024

We started with popular modern data warehouses and quickly expanded our support as data lakes became data lakehouses. Ensuring Data Quality In Dremio Dremio and its SQL Query Engine efficiently queries (but doesn’t move) data across a diverse set of sources. And now, Dremio!

SQL

SQL Engineering Data Lake High Quality Data

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. 1: Multi-function analytics . 2: Open formats. Reproducibility for ML Ops. Flexible and open file formats.

Metadata

Metadata Data Architecture BI Machine Learning

Keeping Your Data Warehouse In Order With DataForm

Data Engineering Podcast

OCTOBER 14, 2019

Summary Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. Raghu Murthy, founder and CEO of Datacoral built data infrastructures at Yahoo! and Facebook, scaling from mere terabytes to petabytes of analytic data. Visit Datacoral.com today to find out more.

Data Warehouse

Data Warehouse PostgreSQL AWS Programming Language

Designing And Building Data Platforms As A Product

Data Engineering Podcast

SEPTEMBER 3, 2021

The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. How do you measure the success of a data platform?

Designing

Designing Building SQL BI

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Additionally, the optimized query execution and data pruning features reduce the compute cost associated with querying large datasets. Scaling data infrastructure while maintaining efficiency is one of the primary challenges of modern data architecture. Amazon S3, Azure Data Lake, or Google Cloud Storage).

Architecture

Architecture Systems Data Lake Google Cloud

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

In a rush to own this term, many vendors have lost sight of the fact that the openness of a data architecture is what guarantees its durability and longevity. On data warehouses and data lakes. Data lakes and data warehouses unify large volumes and varieties of data into a central location.

Data Lake

Data Lake Data Warehouse BI SQL

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

We are excited to offer in Tech Preview this born-in-the-cloud table format that will help future proof data architectures at many of our public cloud customers. Modernizing pipelines. CDP Airflow Operators.

Data Engineering

Data Engineering Data Engineer Engineering Data Pipeline

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Data Engineering Podcast

JULY 3, 2022

Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production.

Architecture

Architecture Metadata MongoDB Data Warehouse

Data Orchestration For Hybrid Cloud Analytics

Data Engineering Podcast

OCTOBER 21, 2019

Raghu Murthy, founder and CEO of Datacoral built data infrastructures at Yahoo! and Facebook, scaling from terabytes to petabytes of analytic data. He started Datacoral with the goal to make SQL the universal data programming language. Raghu Murthy, founder and CEO of Datacoral built data infrastructures at Yahoo!

Cloud

Cloud Hadoop Data Lake Programming Language

New Snowflake Features Released in September–November 2023

Snowflake

DECEMBER 12, 2023

To give customers flexibility for how they fit Snowflake into their data architecture, Iceberg Tables can be configured to use either Snowflake or an external service such as AWS Glue as the table’s catalog to track metadata, with an easy, one-line SQL command to convert the table’s catalog to Snowflake in a metadata-only operation.

Metadata

Metadata Python Government AWS

Scale Your Analytics On The Clickhouse Data Warehouse

Data Engineering Podcast

JULY 8, 2019

We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. What are some of the advanced capabilities, such as SQL extensions, supported data types, etc.

Data Warehouse

Data Warehouse MySQL Hadoop Data Lake

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Snowflake

JULY 22, 2024

The benefits of migrating to Snowflake start with its multi-cluster shared data architecture, which enables scalability and high performance. Finalizing the new business capabilities and use cases to be enabled in the next phases. Features such as auto-suspend and a pay-as-you-go model help you save costs.

Government

Government Data Cloud Hadoop

SnowflakeDB: The Data Warehouse Built For The Cloud

Data Engineering Podcast

DECEMBER 8, 2019

Contact Info LinkedIn Website @KentGraziano on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?

Data Warehouse

Data Warehouse Cloud AWS Relational Database

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

FEBRUARY 15, 2023

In this context, data management in an organization is a key point for the success of its projects involving data. One of the main aspects of correct data management is the definition of a data architecture. First, let’s write the data from 2016 to the delta table. load("/data/acidentes/datatran2016.csv")

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

JUNE 16, 2019

We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference.

Data Lake

Data Lake Lambda Architecture Data Warehouse Hadoop

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

ACID transactions, ANSI 2016 SQL SupportMajor Performance improvements. The data lifecycle model ingests data using Kafka, enriches that data with Spark-based batch process, performs deep data analytics using Hive and Impala, and finally uses that data for data science using Cloudera Data Science Workbench to get deep insights.

Cloud

Cloud Kafka Professional Services Metadata

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

Companies, on the other hand, have continued to demand highly scalable and flexible analytic engines and services on the data lake, without vendor lock-in. Organizations want modern data architectures that evolve at the speed of their business and we are happy to support them with the first open data lakehouse. .

Data Lake

Data Lake Business Intelligence Metadata Data Warehouse

Microsoft Fabric vs Power BI: Key Differences & Which to Use

Edureka

APRIL 14, 2025

Meanwhile, the visualization tool offers wide-ranging data connectors—from Azure SQL and SharePoint to Salesforce and Google Analytics—enabling quick access to structured and semi-structured data. However, it leans more toward transforming and presenting cleaned data rather than processing raw datasets.

BI

BI Business Intelligence Raw Data Retail

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. Understanding the essential components of data pipelines is crucial for designing efficient and effective data architectures.

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

5 Key Takeaways from Flink Forward 2023

Cloudera

NOVEMBER 27, 2023

Many architects and team leaders expressed to us a desire to democratize stream processing to larger user bases, especially SQL analysts and/or a desire to move from manual configuration and maintenance of Flink environments to more of a PaaS model to maintain performance while freeing up development resources.

Kafka

Kafka SQL ETL Tools Architecture

Building a Scalable Search Architecture

Confluent

JUNE 18, 2019

Who has never seen an application use RDBMS SQL statements to run searches? Using SQL to run your search might be enough for your use case, but as your project requirements grow and more advanced features are needed—for example, enabling synonyms, multilingual search, or even machine learning—your relational database might not be enough.

Architecture

Architecture Building Kafka Database-centric

Simplifying Data Architecture and Security to Accelerate Value

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Trending Sources

A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

5 Advantages of Real-Time ETL for Snowflake

Snowflake To Acquire Ponder, Boosting Python Capabilities In the Data Cloud

8 Takeaways from Snowflake’s Accelerate Events for Retail, CPG and Media

Introducing Apache Iceberg in Cloudera Data Platform

Striim Achieves Google Cloud Ready — Cloud SQL Designation

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

A Complete Guide to Scale Your Data Pipelines and Data Products with Contract Testing and Dbt

How to Become a Microsoft Fabric Engineer?

Let Your Analysts Build A Data Lakehouse With Cuelake

SQL for Data Engineering: Success Blueprint for Data Engineers

Join us at the Iceberg Summit 2024

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Ramp Simplifies Data Architecture Management, Cuts Costs, and Delivers Market Insights to Customers at Scale

Accelerate Development Of Enterprise Analytics With The Coalesce Visual Workflow Builder

Fast Analytics On Semi-Structured And Structured Data In The Cloud

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Building Real-Time Data Platforms For Large Volumes Of Information With Aerospike

Data Architect: Role Description, Skills, Certifications and When to Hire

Just Launched: Dremio SQL Query Engine Data Quality Monitoring

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Keeping Your Data Warehouse In Order With DataForm

Designing And Building Data Platforms As A Product

Why Open Table Format Architecture is Essential for Modern Data Systems

The Future of the Data Lakehouse – Open

Cloudera Data Engineering 2021 Year End Review

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Data Orchestration For Hybrid Cloud Analytics

New Snowflake Features Released in September–November 2023

Scale Your Analytics On The Clickhouse Data Warehouse

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

SnowflakeDB: The Data Warehouse Built For The Cloud

Hands-On Introduction to Delta Lake with (py)Spark

Maintaining Your Data Lake At Scale With Spark

Upgrade Journey: The Path from CDH to CDP Private Cloud

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Microsoft Fabric vs Power BI: Key Differences & Which to Use

A Guide to Data Pipelines (And How to Design One From Scratch)

5 Key Takeaways from Flink Forward 2023

Building a Scalable Search Architecture

Stay Connected