Data Engineer, Hadoop and SQL - Data Engineering Digest

Most Essential 2023 Interview Questions on Data Engineering

Analytics Vidhya

FEBRUARY 7, 2023

Introduction Data engineering is the field of study that deals with the design, construction, deployment, and maintenance of data processing systems. The goal of this domain is to collect, store, and process data efficiently and efficiently so that it can be used to support business decisions and power data-driven applications.

Data Engineering

Data Engineering Data Engineer Engineering Data

Reflecting On The Past 6 Years Of Data Engineering

Data Engineering Podcast

FEBRUARY 5, 2023

In that time there have been a number of generational shifts in how data engineering is done. Materialize’s PostgreSQL-compatible interface lets users leverage the tools they already use, with unsurpassed simplicity enabled by full ANSI SQL support.

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How to learn data engineering

Christophe Blefari

JANUARY 20, 2024

Learn data engineering, all the references ( credits ) This is a special edition of the Data News. But right now I'm in holidays finishing a hiking week in Corsica 🥾 So I wrote this special edition about: how to learn data engineering in 2024. What is Hadoop? Who are the data engineers?

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

One job that has become increasingly popular across enterprise data teams is the role of the AI data engineer. Demand for AI data engineers has grown rapidly in data-driven organizations. But what does an AI data engineer do? Table of Contents What Does an AI Data Engineer Do?

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Many of our customers — from Marriott to AT&T — start their journey with the Snowflake AI Data Cloud by migrating their data warehousing workloads to the platform. Serving a company that has games available in more than 190 countries and employs more than 8,000 people, its data engineering team is always running.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

I joined Facebook in 2011 as a business intelligence engineer. By the time I left in 2013, I was a data engineer. We were data engineers! Data Engineering? Data science as a discipline was going through its adolescence of self-affirming and defining itself. We were pioneers.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

How to get started with dbt

Christophe Blefari

MARCH 1, 2023

dbt Core is an open-source framework that helps you organise data warehouse SQL transformation. dbt was born out of the analysis that more and more companies were switching from on-premise Hadoop data infrastructure to cloud data warehouses. This switch has been lead by modern data stack vision.

Data Warehouse

Data Warehouse SQL Metadata Raw Data

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

Hadoop and Spark are the two most popular platforms for Big Data processing. They both enable you to deal with huge collections of data no matter its format — from Excel tables to user feedback on websites to images and video files. What are its limitations and how do the Hadoop ecosystem address them? What is Hadoop.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is Data Science? What are the roles and responsibilities of a Data Engineer? And many more. And many more.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

OCTOBER 30, 2021

If you’re an executive who has a hard time understanding the underlying processes of data science and get confused with terminology, keep reading. We will try to answer your questions and explain how two critical data jobs are different and where they overlap. Data science vs data engineering.

Data Engineering

Data Engineering Data Engineer Engineering Machine Learning

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Cloudera

MAY 18, 2021

Prior the introduction of CDP Public Cloud, many organizations that wanted to leverage CDH, HDP or any other on-prem Hadoop runtime in the public cloud had to deploy the platform in a lift-and-shift fashion, commonly known as “Hadoop-on-IaaS” or simply the IaaS model. SQL-driven Streaming App Development. Introduction.

Hadoop

Hadoop Cloud AWS Utilities

Maintain Your Data Engineers' Sanity By Embracing Automation

Data Engineering Podcast

JULY 10, 2022

Summary Building and maintaining reliable data assets is the prime directive for data engineers. While it is easy to say, it is endlessly complex to implement, requiring data professionals to be experts in a wide range of disparate topics while designing and implementing complex topologies of information workflows.

Data Engineering

Data Engineering Data Engineer Engineering MongoDB

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Data Engineering Podcast

SEPTEMBER 7, 2020

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it?

Architecture

Architecture Data Architecture SQL Engineering

A Reflection On Learning A Lot More Than 97 Things Every Data Engineer Should Know

Data Engineering Podcast

JANUARY 30, 2022

Summary The Data Engineering Podcast has been going for five years now and has included conversations and interviews with a huge number of guests, covering a broad range of topics. In this episode he shares some reflections on producing the podcast, compiling the book, and relevant trends in the ecosystem of data engineering.

Data Engineering

Data Engineering Data Engineer Engineering Data Pipeline

SQL for Data Engineering: Success Blueprint for Data Engineers

ProjectPro

FEBRUARY 16, 2023

The demand for skilled data engineers who can build, maintain, and optimize large data infrastructures does not seem to slow down any sooner. At the heart of these data engineering skills lies SQL that helps data engineers manage and manipulate large amounts of data.

Data Engineering

Data Engineering Data Engineer SQL Engineering

Best Morgan Stanley Data Engineer Interview Questions

U-Next

MARCH 1, 2023

Introduction Data Engineer is responsible for managing the flow of data to be used to make better business decisions. A solid understanding of relational databases and SQL language is a must-have skill, as an ability to manipulate large amounts of data effectively. The actual data is not kept in this case.

Data Engineering

Data Engineering Data Engineer Non-relational Database Engineering

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Data Engineers of Netflix?—?Interview with Kevin Wylie

Netflix Tech

JULY 15, 2021

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin, what drew you to data engineering?

Data Engineering

Data Engineering Data Engineer Engineering Entertainment

Inside Agoda’s Private Cloud - Exclusive

The Pragmatic Engineer

JUNE 13, 2023

Although the company employs about 4-500 people in its data organization, there’s been no single dedicated “data team” for around 2 years. Agoda moved away from this model and its data engineers are embedded into each team. The company runs 4 data centers: in the US and Europe, with two in Asia.

Cloud

Cloud Database Utilities BI

Data Engineering Weekly #173

Data Engineering Weekly

MAY 26, 2024

[link] Tweeq: Tweeq Data Platform: Journey and Lessons Learned: Clickhouse, dbt, Dagster, and Superset Tweeq writes about its journey of building a data platform with cloud-agnostic open-source solutions and some integration challenges. It is refreshing to see an open stack after the Hadoop era.

Data Engineering

Data Engineering Data Engineer Engineering Google Cloud

TimescaleDB: The Timeseries Database Built For SQL And Scale - Episode 65

Data Engineering Podcast

JANUARY 13, 2019

release, how the use cases for timeseries data have proliferated, and how they are continuing to simplify the task of processing your time oriented events. Links TimescaleDB Original Appearance on the Data Engineering Podcast 1.0 Links TimescaleDB Original Appearance on the Data Engineering Podcast 1.0

Database

Database PostgreSQL SQL MongoDB

Data Engineering Weekly #201

Data Engineering Weekly

DECEMBER 15, 2024

[link] Dani: Apache Iceberg: The Hadoop of the Modern Data Stack? The comment on Iceber, a Hadoop of the modern data stack, surprises me. Iceberg has not reduced the complexity of the data stack, and all the legacy Hadoop complexity still exists on top of Apache Iceberg.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Top SQL-on-Hadoop Tools

ProjectPro

MAY 12, 2016

Big Data has found a comfortable home inside the Hadoop ecosystem. Hadoop based data stores have gained wide acceptance around the world by developers, programmers, data scientists, and database experts. Explore SQL Database Projects to Add them to Your Data Engineer Resume.

Hadoop

Hadoop SQL Business Intelligence Java

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

Below a diagram describing what I think schematises data platforms: Data storage — you need to store data in an efficient manner, interoperable, from the fresh to the old one, with the metadata. you could write the same pipeline in Java, in Scala, in Python, in SQL, etc.—with 3) Spark 4.0

Metadata

Metadata Data Warehouse BI MySQL

Performing Fast Data Analytics Using Apache Kudu - Episode 64

Data Engineering Podcast

JANUARY 6, 2019

Summary The Hadoop platform is purpose built for processing large, slow moving data in long-running batch jobs. As the ecosystem around it has grown, so has the need for fast data analytics on fast moving data. For a perfect pairing, they made it easy to connect to the Impala SQL engine.

Data Analytics

Data Analytics Hadoop Kafka Media

PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32

Data Engineering Podcast

MAY 20, 2018

Summary Most businesses end up with data in a myriad of places with varying levels of structure. Presto is a distributed SQL engine that allows you to tie all of your information together without having to first aggregate it all into a data warehouse. What are some cases in which Presto is not the right solution?

PostgreSQL

PostgreSQL Hadoop SQL Kafka

Recap of Hadoop News for February 2018

ProjectPro

MARCH 1, 2018

News on Hadoop - February 2018 Kyvos Insights to Host Webinar on Accelerating Business Intelligence with Native Hadoop BI Platforms. The leading big data analytics company Kyvo Insights is hosting a webinar titled “Accelerate Business Intelligence with Native Hadoop BI platforms.”

Hadoop

Hadoop NoSQL Retail BI

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

AUGUST 3, 2021

In this episode Vinoth shares the history of the project, how its architecture allows for building more frequently updated analytical queries, and the work being done to add a more polished experience to the data lake paradigm. Go to dataengineeringpodcast.com/97things today to get your copy! Write some Python scripts to automate it?

Data Lake

Data Lake Data Warehouse Hadoop Architecture

The Week of Data Conference Extravaganza: Databricks, Snowflake, LLM and the Future of Data Engineering

Data Engineering Weekly

JUNE 29, 2023

But hey, I met my friends after a long time and got my copy of “ Fundamentals of Data Engineering ” signed by Joe Reis & Matt Housely. If you’re starting data engineering, I highly recommend reading it. Snowflake is a DataLake Platform Snowflake is moving beyond a SQL data warehouse.

Data Engineering

Data Engineering Data Engineer Google Cloud Engineering

Ripple's Data Evolution: Leveraging Databricks for Next-Gen XRP Ledger Analytics

Ripple Engineering

JULY 9, 2024

We recently embarked on a significant data platform migration, transitioning from Hadoop to Databricks, a move motivated by our relentless pursuit of excellence and our contributions to the XRP Ledger's (XRPL) data analytics.

Hadoop

Hadoop Data Lake Machine Learning Raw Data

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Data Versioning and Time Travel Open Table Formats empower users with time travel capabilities, allowing them to access previous dataset versions. The first insert statement loads data having c_custkey between 30001 and 40000 – INSERT INTO ib_customers2 SELECT *, '11111111111111' AS HASHKEY FROM snowflake_sample_data.tpch_sf1.customer

Architecture

Architecture Systems Data Lake Google Cloud

Data News — Week 23.14

Christophe Blefari

APRIL 8, 2023

I was in the Hadoop world and all I was doing was denormalisation. The only normalisation I did was back at the engineering school while learning SQL with Normal Forms. Actually what I cared was physical storage, data formats, logical partitioning or indexing. Data modeling should not be a required data engineer skill.

Pipeline-centric

Pipeline-centric Database-centric Algorithm Data

Data News — Week 13.14

Christophe Blefari

APRIL 8, 2023

I was in the Hadoop world and all I was doing was denormalisation. The only normalisation I did was back at the engineering school while learning SQL with Normal Forms. Actually what I cared was physical storage, data formats, logical partitioning or indexing. Data modeling should not be a required data engineer skill.

Pipeline-centric

Pipeline-centric Database-centric Algorithm Data

Data Engineer Roles And Responsibilities 2022

U-Next

AUGUST 17, 2022

Introduction to 2022 Data Engineer Roles and Responsibilities. Companies and enterprises, large and small, are built on data. Data Engineer roles and responsibilities include aiding in the collection of issues and the delivery of remedies addressing customer demand and product accessibility.

Data Engineering

Data Engineering Data Engineer Database-centric Pipeline-centric

Azure Data Engineer Resume

Edureka

FEBRUARY 9, 2023

Azure Data Engineering is a rapidly growing field that involves designing, building, and maintaining data processing systems using Microsoft Azure technologies. Contents: What is the role of an Azure Data Engineer?

Data Engineering

Data Engineering Data Engineer Engineering Amazon Web Services

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Data Engineering Podcast

FEBRUARY 11, 2018

Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Is timescale compatible with systems such as Amazon RDS or Google Cloud SQL? What impact has the 10.0 What impact has the 10.0

PostgreSQL

PostgreSQL NoSQL Google Cloud MongoDB

Difference between Pig and Hive-The Two Key Components of Hadoop Ecosystem

ProjectPro

OCTOBER 15, 2014

Pig and Hive are the two key components of the Hadoop ecosystem. What does pig hadoop or hive hadoop solve? Pig hadoop and Hive hadoop have a similar goal- they are tools that ease the complexity of writing complex java MapReduce programs. Apache HIVE and Apache PIG components of the Hadoop ecosystem are briefed.

Hadoop

Hadoop Java Unstructured Data SQL

A High Performance Platform For The Full Big Data Lifecycle

Data Engineering Podcast

AUGUST 19, 2019

Summary Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of the early entrants that predates Hadoop and has since been open sourced is the HPCC (High Performance Computing Cluster) system.

Big Data

Big Data Hadoop Data Lake Media

Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8

Data Engineering Podcast

NOVEMBER 22, 2017

They also discuss the role of Arrow as a mechanism for in-memory data sharing and how hardware evolution will influence the state of the art for data formats. You’ve each developed a new on-disk data format, Avro and Parquet respectively. You’ve each developed a new on-disk data format, Avro and Parquet respectively.

Hadoop

Hadoop Data Storage Data Pipeline Data Engineering

Self Service Business Intelligence And Data Sharing Using Looker with Daniel Mintz - Episode 55

Data Engineering Podcast

NOVEMBER 4, 2018

Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. Given that you are connecting to the customer’s data store, how do you ensure sufficient security?

Business Intelligence

Business Intelligence Hadoop BI Data Warehouse

Taking A Tour Of The Google Cloud Platform For Data And Analytics

Data Engineering Podcast

JUNE 11, 2021

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. No more scripts, just SQL.

Google Cloud

Google Cloud Cloud Big Data Ecosystem Data Warehouse

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

Data Engineering Podcast

DECEMBER 9, 2018

Jean George Perrin has been so impressed by the versatility of Spark that he is writing a book for data engineers to hit the ground running. He also discusses what you need to know to get it deployed and keep it running in a production environment and how it fits into the overall data ecosystem.

MySQL

MySQL Scala Kafka Hadoop

Most Essential 2023 Interview Questions on Data Engineering

Top 8 Interview Questions on Apache Sqoop

Webinars

Trending Sources

Reflecting On The Past 6 Years Of Data Engineering

Webinars

How to learn data engineering

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

The Rise of the Data Engineer

How to get started with dbt

Hadoop vs Spark: Main Big Data Tools Explained

How to Become a Data Engineer in 2024?

Data Scientist vs Data Engineer: Differences and Why You Need Both

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Maintain Your Data Engineers' Sanity By Embracing Automation

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Simplify Your Data Architecture With The Presto Distributed SQL Engine

A Reflection On Learning A Lot More Than 97 Things Every Data Engineer Should Know

SQL for Data Engineering: Success Blueprint for Data Engineers

Best Morgan Stanley Data Engineer Interview Questions

Modern Customer Data Platform Principles

Data Engineers of Netflix?—?Interview with Kevin Wylie

Inside Agoda’s Private Cloud - Exclusive

Data Engineering Weekly #173

TimescaleDB: The Timeseries Database Built For SQL And Scale - Episode 65

Data Engineering Weekly #201

Top SQL-on-Hadoop Tools

Databricks, Snowflake and the future

Performing Fast Data Analytics Using Apache Kudu - Episode 64

PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32

Recap of Hadoop News for February 2018

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

The Week of Data Conference Extravaganza: Databricks, Snowflake, LLM and the Future of Data Engineering

Ripple's Data Evolution: Leveraging Databricks for Next-Gen XRP Ledger Analytics

Why Open Table Format Architecture is Essential for Modern Data Systems

Data News — Week 23.14

Data News — Week 13.14

Data Engineer Roles And Responsibilities 2022

Azure Data Engineer Resume

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Difference between Pig and Hive-The Two Key Components of Hadoop Ecosystem

A High Performance Platform For The Full Big Data Lifecycle

Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8

Self Service Business Intelligence And Data Sharing Using Looker with Daniel Mintz - Episode 55

Taking A Tour Of The Google Cloud Platform For Data And Analytics

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

Stay Connected