Cloud, Data and Data Lake - Data Engineering Digest

Setting up Data Lake on GCP using Cloud Storage and BigQuery

Analytics Vidhya

FEBRUARY 25, 2023

Introduction A data lake is a centralized and scalable repository storing structured and unstructured data. The need for a data lake arises from the growing volume, variety, and velocity of data companies need to manage and analyze.

Cloud Storage

Cloud Storage Data Lake Cloud Unstructured Data

Enabling Security for Hadoop Data Lake on Google Cloud Storage

Uber Engineering

OCTOBER 27, 2024

Ready to boost your Hadoop Data Lake security on GCP? Our latest blog dives into enabling security for Uber’s modernized batch data lake on Google Cloud Storage!

Cloud Storage

Cloud Storage Google Cloud Data Lake Hadoop

DataMesh: How Uber laid the foundations for the data lake cloud migration

Uber Engineering

OCTOBER 27, 2024

Learn how Uber is streamlining the Cloud migration of its massive Data Lake by incorporating key Data Mesh principles.

Data Lake

Data Lake Cloud Data IT

Webinars

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Apache Airflow®: The Ultimate Guide to DAG Writing

MORE WEBINARS

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Data Engineering Podcast

SEPTEMBER 1, 2021

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. In recent months the community has focused their efforts on making it the fastest possible option for running your analytics in the cloud. and take control of your data quality today.

Data Lake

Data Lake Cloud AWS SQL

Data Warehouse vs. Data Lake

Precisely

MARCH 9, 2023

As cloud computing platforms make it possible to perform advanced analytics on ever larger and more diverse data sets, new and innovative approaches have emerged for storing, preprocessing, and analyzing information. In this article, we’ll focus on a data lake vs. data warehouse.

Data Lake

Data Lake Data Warehouse Hadoop Raw Data

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

AUGUST 3, 2021

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Many of our customers — from Marriott to AT&T — start their journey with the Snowflake AI Data Cloud by migrating their data warehousing workloads to the platform. Today we’re focusing on customers who migrated from a cloud data warehouse to Snowflake and some of the benefits they saw.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Join in with the event for the global data community, Data Council Austin.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Data Access API over Data Lake Tables Without the Complexity

Towards Data Science

SEPTEMBER 27, 2023

Data Access API over Data Lake Tables Without the Complexity Build a robust GraphQL API service on top of your S3 data lake files with DuckDB and Go Photo by Joshua Sortino on Unsplash 1. This data might be primarily used for internal reporting, but might also be valuable for other services in our organization.

Data Lake

Data Lake Accessible Accessibility SQL

Top Data Lake Vendors (Quick Reference Guide)

Monte Carlo

APRIL 24, 2023

Data lakes are useful, flexible data storage repositories that enable many types of data to be stored in its rawest state. Traditionally, after being stored in a data lake, raw data was then often moved to various destinations like a data warehouse for further processing, analysis, and consumption.

Data Lake

Data Lake Google Cloud Data Warehouse AWS

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData: Data Engineering

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake?

Data Lake

Data Lake Process Metadata Data Warehouse

Cloud Native Data Orchestration For Machine Learning And Data Engineering With Flyte

Data Engineering Podcast

MAY 22, 2022

Summary Machine learning has become a meaningful target for data applications, bringing with it an increase in the complexity of orchestrating the entire data flow. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform.

Machine Learning

Machine Learning Data Engineer Data Engineering Cloud

Snowflake and S3 Data Lake

Cloudyard

DECEMBER 13, 2022

Read Time: 4 Minute, 23 Second During this post we will discuss how AWS S3 service and Snowflake integration can be used as Data Lake in current organizations. How customer has migrated On Premises EDW to Snowflake to leverage snowflake Data Lake capabilities. Create S3 bucket to hold the tables data.

Data Lake

Data Lake AWS Metadata Data

Add Version Control To Your Data Lake With LakeFS

Data Engineering Podcast

NOVEMBER 2, 2020

Summary Data lakes are gaining popularity due to their flexibility and reduced cost of storage. Along with the benefits there are some additional complexities to consider, including how to safely integrate new data sources or test out changes to existing pipelines.

Data Lake

Data Lake PostgreSQL Datasets Machine Learning

Straining Your Data Lake Through A Data Mesh

Data Engineering Podcast

JULY 22, 2019

Summary The current trend in data management is to centralize the responsibilities of storing and curating the organization’s information to a data engineering team. This organizational pattern is reinforced by the architectural pattern of data lakes as a solution for managing storage and access.

Data Lake

Data Lake Hadoop Data Architecture

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. Data lakes are notoriously complex. How is it different from the current Dagster Cloud product? Your first 30 days are free!

Data Lake

Data Lake High Quality Data Hadoop Data Pipeline

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

Data Engineering Podcast

MAY 15, 2022

Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform that relies on a data lake as its central architectural tenet adds additional layers of difficulty. Missing data? Struggling with broken pipelines? Stale dashboards?

Data Lake

Data Lake Building Architecture BI

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

Modern companies are ingesting, storing, transforming, and leveraging more data to drive more decision-making than ever before. At the same time, 81% of IT leaders say their C-suite has mandated no additional spending or a reduction of cloud costs. But, the options for data storage are evolving quickly. Let’s dive in.

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Data Pipeline

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

AUGUST 11, 2021

“Data Lake vs Data Warehouse = Load First, Think Later vs Think First, Load Later” The terms data lake and data warehouse are frequently stumbled upon when it comes to storing large volumes of data. Data Warehouse Architecture What is a Data lake?

Data Lake

Data Lake Data Warehouse Cloud Hadoop

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

JUNE 16, 2019

Summary Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that data lakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics.

Data Lake

Data Lake Lambda Architecture Data Warehouse Hadoop

Workrise Builds a Strong Security Program with Its Security Data Lake and Modern SIEM

Snowflake

AUGUST 8, 2023

In 2022, the company realized that in order to monitor, analyze, counter, and prevent security threats, it needed both a modern data lake as well as a security information and event management (SIEM) solution. Today, about 90% of SIEM solutions deliver capabilities exclusively in the cloud—up from 20% just three years ago.

Data Lake

Data Lake Programming IT Building

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Data Lake

Data Lake High Quality Data Architecture Data Pipeline

Tips to Build a Robust Data Lake Infrastructure

DareData

JULY 5, 2023

Learn how we build data lake infrastructures and help organizations all around the world achieving their data goals. In today's data-driven world, organizations are faced with the challenge of managing and processing large volumes of data efficiently.

Data Lake

Data Lake Building Raw Data ETL Tools

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

Summary Sharing data is a simple concept, but complicated to implement well. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data Government Data Pipeline

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Summary Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data BI Data Workflow

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Want to see Starburst in action?

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Data Lake vs. Data Warehouse: Differences and Similarities

U-Next

SEPTEMBER 7, 2022

The terms “ Data Warehouse ” and “ Data Lake ” may have confused you, and you have some questions. Structuring data refers to converting unstructured data into tables and defining data types and relationships based on a schema. What is Data Warehouse? . What is Data Lake? .

Data Lake

Data Lake Data Warehouse Unstructured Data Amazon Web Services

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data. What is a data lake?

Data Lake

Data Lake Architecture IT Amazon Web Services

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

phData: Data Engineering

APRIL 4, 2023

Today we want to introduce Fivetran’s support for Amazon S3 with Apache Iceberg, investigate some of the implications of this feature, and learn how it fits into the modern data architecture as a whole. Fivetran today announced support for Amazon Simple Storage Service (Amazon S3) with Apache Iceberg data lake format.

Data Lake

Data Lake Amazon Web Services Data Cleanse Data Warehouse

How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

Data Engineering Podcast

NOVEMBER 11, 2018

Summary A data lake can be a highly valuable resource, as long as it is well built and well managed. In this episode Yoni Iny, CTO of Upsolver, discusses the various components that are necessary for a successful data lake project, how the Upsolver platform is architected, and how modern data lakes can benefit your organization.

Data Lake

Data Lake Building Cloud Kafka

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. From a product perspective, what are the data challenges that are posed by email?

Data Lake

Data Lake High Quality Data Data Pipeline Machine Learning

Data Cloud Cost Optimization With Bluesky Data

Data Engineering Podcast

MAY 29, 2022

Summary The latest generation of data warehouse platforms have brought unprecedented operational simplicity and effectively infinite scale. In order to ensure that you can explore and analyze your data without spending money on inefficient queries Mingsheng Hong and Zheng Shao created Bluesky Data.

Cloud

Cloud Data Warehouse Data Lake Data

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. Can you start by sharing some of your experiences with data migration projects?

Systems

Systems Data Lake High Quality Data Google Cloud

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Data lakes are notoriously complex.

Kafka

Kafka Data Lake High Quality Data SQL

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Want to see Starburst in action?

Designing

Designing Data Lake High Quality Data SQL

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Data lakes are notoriously complex.

Architecture

Architecture Data Lake High Quality Data SQL

Best Practices for Your AWS Cloud Migration

Precisely

OCTOBER 3, 2024

In reviewing best practices for your AWS cloud migration, it’s crucial to define your business case first, and work from there. With today’s complex IT environments, you need seamless access to data across every platform – whether it’s in the cloud, on-premises, or hybrid. Should you consider data augmentation?

AWS

AWS Cloud Amazon Web Services Data Integration

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Data lakes are notoriously complex. Your first 30 days are free! Want to see Starburst in action?

Building

Building Data Lake High Quality Data Machine Learning

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Building

Building Data Lake High Quality Data Machine Learning

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Summary The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Data lakes are notoriously complex.

Systems

Systems Designing Data Lake SQL

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

Data Engineering Podcast

MARCH 27, 2022

Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in data management adds additional stress to an already complex endeavor.

Data Governance

Data Governance Government Cloud Building

Setting up Data Lake on GCP using Cloud Storage and BigQuery

Enabling Security for Hadoop Data Lake on Google Cloud Storage

Webinars

Trending Sources

DataMesh: How Uber laid the foundations for the data lake cloud migration

Webinars

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Data Warehouse vs. Data Lake

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Access API over Data Lake Tables Without the Complexity

Top Data Lake Vendors (Quick Reference Guide)

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Cloud Native Data Orchestration For Machine Learning And Data Engineering With Flyte

Snowflake and S3 Data Lake

Add Version Control To Your Data Lake With LakeFS

Straining Your Data Lake Through A Data Mesh

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Lake vs Data Warehouse - Working Together in the Cloud

Maintaining Your Data Lake At Scale With Spark

Workrise Builds a Strong Security Program with Its Security Data Lake and Modern SIEM

Version Your Data Lakehouse Like Your Software With Nessie

Tips to Build a Robust Data Lake Infrastructure

Data Sharing Across Business And Platform Boundaries

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Modern Customer Data Platform Principles

Data Lake vs. Data Warehouse: Differences and Similarities

Reconciling The Data In Your Databases With Datafold

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

Making Email Better With AI At Shortwave

Data Cloud Cost Optimization With Bluesky Data

Data Migration Strategies For Large Scale Systems

Troubleshooting Kafka In Production

Designing Data Platforms For Fintech Companies

Addressing The Challenges Of Component Integration In Data Platform Architectures

Best Practices for Your AWS Cloud Migration

Build Your Second Brain One Piece At A Time

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Designing Data Transfer Systems That Scale

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

Stay Connected