Top Data Engineering Digest Data Architecture Data Content for 2022

2022

Data Engineering Project for Beginners - Batch edition

Start Data Engineering

MAY 11, 2022

1. Introduction 2. Objective 3. Design 4. Setup 4.1 Prerequisite 4.2 AWS Infrastructure costs 4.3 Data lake structure 5. Code walkthrough 5.1 Loading user purchase data into the data warehouse 5.2 Loading classified movie review data into the data warehouse 5.3 Generating user behavior metric 5.4. Checking results 6. Tear down infra 7. Design considerations 8.

Data Engineering

Data Engineering Data Engineer Project Data Lake

Personal Knowledge Management Workflow for a Deeper Life — as a Computer Scientist

Simon Späti

APRIL 6, 2022

With burnout and mental stress at every level of our lives, I find my Personal Knowledge Management (PKM) system even more valuable. As a human, I forget lots of things. As a dad, I have more responsibilities with remembering all things related to my kid. As a developer and knowledge worker, I re-use code snippets or create new things. That’s why a PKM system such as a Second Brain to store all of it in a sustainable way is crucial to me.

Management

Management Coding Systems IT

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

Trending Sources

Data Science Minimum: 10 Essential Skills You Need to Know to Start Doing Data Science

KDnuggets

DECEMBER 30, 2022

Data science is ever-evolving, so mastering its foundational technical and soft skills will help you be successful in a career as a Data Scientist, as well as pursue advance concepts, such as deep learning and artificial intelligence.

Data Science

Data Science Deep Learning Data IT

Webinars

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

Dataframe Showdown – Polars vs Spark vs Pandas vs DataFusion. Guess who wins?

Confessions of a Data Guy

DECEMBER 10, 2022

There once was a day when no one used DataFrames that much. Back before Spark had really gone mainstream, Data Scientists were still plinking around with Pandas a lot. My My, what would your mother say? How things have changed. Now everyone wants a piece of the DataFrame pie. I mean it tastes so good, […] The post Dataframe Showdown – Polars vs Spark vs Pandas vs DataFusion.

Data

Data IT Big Data Data Engineering

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Who is Still Hiring Software Engineers and EMs?

The Pragmatic Engineer

NOVEMBER 17, 2022

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe here. This article was updated in December 2022. In the midst of gloomy news about hiring freezes and layoffs, let's highlight companies which are growing and hiring.

Software Engineer

Software Engineer Software Engineering Engineering Media

Building a Telegram Bot Powered by Apache Kafka and ksqlDB

Confluent

DECEMBER 2, 2022

ksqlDB use case: see how apps can use ksqlDB to ingest, filter, enrich, aggregate, and query data directly with Kafka—no complex architectures or data stores needed.

Kafka

Kafka Building Architecture Data

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

by Jasmine Omeke , Obi-Ike Nwoke , Olek Gorajek Intro This post is for all data practitioners, who are interested in learning about bootstrapping, standardization and automation of batch data pipelines at Netflix. You may remember Dataflow from the post we wrote last year titled Data pipeline asset management with Dataflow. That article was a deep dive into one of the more technical aspects of Dataflow and didn’t properly introduce this tool in the first place.

Data Pipeline

Data Pipeline Scala Metadata Food

More Trending

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

Data Pipeline

Data Pipeline Scala Metadata Food

Telco 5G Returns Will Come from Enterprise Data Solutions

Cloudera

APRIL 22, 2022

This blog post was written by Dean Bubley , industry analyst, as a guest author for Cloudera. . Communications service providers (CSPs) are rethinking their approach to enterprise services in the era of advanced wireless connectivity and 5G networks, as well as with the continuing maturity of fibre and Software-Defined Wide Area Network (SD-WAN) portfolios. .

Data Solutions

Data Solutions Amazon Web Services Data Storage Cloud

Should We Get Rid Of ETLs?

Seattle Data Guy

DECEMBER 29, 2022

AWS has jumped on the bandwagon of removing the need for ETLs. Snowflake announced this both with their hybrid tables and their partnership with Salesforce. Now, I do take a little issue with the naming “Zero ETLs”. Because at the very surface the functionality described is often closer to a zero integration future, which probably… Read more The post Should We Get Rid Of ETLs?

AWS

AWS Consulting Big Data Data

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

Data Engineering Podcast

DECEMBER 29, 2022

Summary Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this problem, but they are frequently ineffective.

Management

Management Metadata Business Intelligence Data Lake

How to manage and schedule dbt

Christophe Blefari

DECEMBER 19, 2022

Last week dbt Labs decided to change the pricing of their Cloud offering. I've already analysed this in week #22.50 of the Data News. In a nutshell, dbt Cloud pricing is per seat based, which means you pay for each dbt developer. Previously for a team it was $50/month/dev and they increase to $100/month/dev, a 100% increase with a team limit of 8 devs and only one project.

Management

Management Pipeline-centric Database-centric SQL

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Data Pipeline Design Patterns - #1. Data flow patterns

Start Data Engineering

DECEMBER 11, 2022

1. Introduction 2. Source & Sink 2.1. Source Replayability 2.2. Source Ordering 2.3. Sink Overwritability 3. Data pipeline patterns 3.1. Extraction patterns 3.1.1. Time ranged 3.1.2. Full Snapshot 3.1.3. Lookback 3.1.4. Streaming 3.2. Behavioral 3.2.1. Idempotent 3.2.2. Self-healing 3.3. Structural 3.3.1. Multi-hop pipelines 3.3.2. Conditional/ Dynamic pipelines 3.3.3.

Data Pipeline

Data Pipeline Designing Data

Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Simon Späti

JUNE 20, 2022

Data consumers, such as data analysts, and business users, care mostly about the production of data assets. On the other hand, data engineers have historically focused on modeling the dependencies between tasks (instead of data assets) with an orchestrator tool. How can we reconcile both worlds? This article reviews open-source data orchestration tools (Airflow, Prefect, Dagster) and discusses how data orchestration tools introduce data assets as first-class objects.

Data Pipeline

Data Pipeline Data Data Engineering Data Engineer

More Data Science Cheatsheets

KDnuggets

DECEMBER 30, 2022

It's time again to look at some data science cheatsheets. Here you can find a short selection of such resources which can cater to different existing levels of knowledge and breadth of topics of interest.

Data Science

Data Science Data IT

I asked ChatGPT to write a blog post about Data Engineering. Here it is.

Confessions of a Data Guy

DECEMBER 29, 2022

Data engineering is a vital field within the realm of data science that focuses on the practical aspects of collecting, storing, and processing large amounts of data. It involves designing and building the infrastructure to store and process data, as well as developing the tools and systems to extract valuable insights and knowledge from that […] The post I asked ChatGPT to write a blog post about Data Engineering.

Data Engineering

Data Engineering Data Engineer Engineering IT

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

A Return to the Office (RTO) Wave?

The Pragmatic Engineer

DECEMBER 8, 2022

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe here. On Thursday, 29 November, Snap CEO Evan Spiegel, sent an email announcing Snap will mandate 4 days/week in the office, starting from January.

Software Engineering

Software Engineering Software Engineer Medical Insurance

4 Must-Have Tests for Your Apache Kafka CI/CD with GitHub Actions

Confluent

JULY 26, 2022

Explore GitHub Actions for your Kafka CI/CD pipeline, automate Schema Registry, and transform the development and testing of Kafka client applications.

Kafka

Rapid Event Notification System at Netflix

Netflix Tech

FEBRUARY 18, 2022

By: Ankush Gulati , David Gevorkyan Additional credits: Michael Clark , Gokhan Ozer Intro Netflix has more than 220 million active members who perform a variety of actions throughout each session, ranging from renaming a profile to watching a title. Reacting to these actions in near real-time to keep the experience consistent across devices is critical for ensuring an optimal member experience.

Systems

Systems Architecture Portfolio Designing

Top 38 Python Libraries for Data Science, Data Visualization & Machine Learning

KDnuggets

DECEMBER 29, 2022

This article compiles the 38 top Python libraries for data science, data visualization & machine learning, as best determined by KDnuggets staff.

Machine Learning

Machine Learning Data Science Python Data

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

7 Super Cheat Sheets You Need To Ace Machine Learning Interview

KDnuggets

DECEMBER 19, 2022

Revise the concepts of machine learning algorithms, frameworks, and methodologies to ace the technical interview round.

Machine Learning

Machine Learning Algorithm

Git for Data Science Cheatsheet

KDnuggets

NOVEMBER 16, 2022

Knowing git is no longer an option for data professionals. Grab this handy reference sheet now and make sure you know how to git the job done.

Data Science

Data Science Data Programming

What To Expect for AI Quality Trends In 2023

KDnuggets

NOVEMBER 16, 2022

Based on the recent discussions with dozens of Fortune 500 data science teams, we can expect to see a continued spotlight on AI model quality in 2023.

Data Science

Data Science Data

Understanding Bias-Variance Trade-Off in 3 Minutes

KDnuggets

NOVEMBER 10, 2022

This article is the write-up of a Machine Learning Lighting Talk, intuitively explaining an important data science concept in 3 minutes.

Data Science

Data Science Machine Learning Data

Approaches to Text Summarization: An Overview

KDnuggets

NOVEMBER 7, 2022

This article will present the main approaches to text summarization currently employed, as well as discuss some of their characteristics.

Process

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

Data

2022

Data Engineering Project for Beginners - Batch edition

Personal Knowledge Management Workflow for a Deeper Life — as a Computer Scientist

Webinars

Trending Sources

Data Science Minimum: 10 Essential Skills You Need to Know to Start Doing Data Science

Webinars

Dataframe Showdown – Polars vs Spark vs Pandas vs DataFusion. Guess who wins?

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

Who is Still Hiring Software Engineers and EMs?

Building a Telegram Bot Powered by Apache Kafka and ksqlDB

Ready-to-go sample data pipelines with Dataflow

Sign up to get articles personalized to your interests!

More Trending

Ready-to-go sample data pipelines with Dataflow

Telco 5G Returns Will Come from Enterprise Data Solutions

Should We Get Rid Of ETLs?

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

How to manage and schedule dbt

The Ultimate Guide to Apache Airflow DAGS

Data Pipeline Design Patterns - #1. Data flow patterns

Data Orchestration Trends: The Shift From Data Pipelines to Data Products

More Data Science Cheatsheets

I asked ChatGPT to write a blog post about Data Engineering. Here it is.

How to Achieve High-Accuracy Results When Using LLMs

A Return to the Office (RTO) Wave?

4 Must-Have Tests for Your Apache Kafka CI/CD with GitHub Actions

Rapid Event Notification System at Netflix

Top 38 Python Libraries for Data Science, Data Visualization & Machine Learning

Apache Airflow® Best Practices: DAG Writing

7 Super Cheat Sheets You Need To Ace Machine Learning Interview

What Can AI-Powered RPA and IA Mean For Businesses?

How To Overcome The Fear of Math and Learn Math For Data Science

We Don’t Need Data Scientists, We Need Data Engineers

Optimizing The Modern Developer Experience with Coder

How I Got 4 Data Science Offers and Doubled My Income 2 Months After Being Laid Off

How Much Math Do You Need in Data Science?

Introduction to Pandas for Data Science

If I Had To Start Learning Data Science Again, How Would I Do It?

15 Modern Use Cases for Enterprise Business Intelligence

Git for Data Science Cheatsheet

What To Expect for AI Quality Trends In 2023

Understanding Bias-Variance Trade-Off in 3 Minutes

Approaches to Text Summarization: An Overview

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Stay Connected