Top Data Engineering Digest Data Engineer Data Engineering Content for March, 2023

March, 2023

5 Machine Learning Skills Every Machine Learning Engineer Should Know in 2023

KDnuggets

MARCH 28, 2023

Most essential skills are programming, data preparation, statistical analysis, deep learning, and natural language processing.

Machine Learning

Machine Learning Deep Learning Data Preparation Engineering

5 Advance Projects for Data Science Portfolio

KDnuggets

MARCH 30, 2023

Work on data analytics, time series, natural language processing, machine learning, and ChatGPT projects to improve your chance of getting hired.

Portfolio

Portfolio Project Data Science Machine Learning

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

AWS Lambdas. Useful for Data Engineering?

Confessions of a Data Guy

MARCH 20, 2023

Are lambdas one of those tools that everyone uses and no one talks about? I guess I’ve taken them for granted over the years, even though they are incredibly useful. For a lot of my Data Engineering career I didn’t really think about or use AWS lambdas, I just saw them as little annoying flies […] The post AWS Lambdas. Useful for Data Engineering?

AWS

AWS Data Engineering Data Engineer Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Amazon doubling down on return to office

The Pragmatic Engineer

MARCH 16, 2023

Comments

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

How to get started with dbt

Christophe Blefari

MARCH 1, 2023

This article is meant to be a resource hub in order to understand dbt basics and to help get started your dbt journey. When I write dbt, I often mean dbt Core. dbt Core is an open-source framework that helps you organise data warehouse SQL transformation. dbt Core has been developed by dbt Labs, which was previously named Fishtown Analytics. The company has been founded in May 2016. dbt Labs also develop dbt Cloud which is a cloud product that hosts and runs dbt Core projects.

Data Warehouse

Data Warehouse SQL Metadata Raw Data

Advanced NumPy: Broadcasting and Strides

Analytics Vidhya

MARCH 5, 2023

Introduction NumPy is an open-source library in python and a must-learn if you want to enter the data science ecosystem. It is the library underpinning other important libraries such as Pandas, matplotlib, Scipy, scikit-learn, etc. One of the reasons this library is so foundational is because of its array of programming capabilities. Array programming, or […] The post Advanced NumPy: Broadcasting and Strides appeared first on Analytics Vidhya.

Python

Python Data Science Programming IT

Worth reading for data engineers - part 2

Waitingforcode

MARCH 23, 2023

Welcome to the 2nd part of the series with great streaming and project organization blog posts summaries!

Data Engineering

Data Engineering Data Engineer Engineering Project

More Trending

Worth reading for data engineers - part 2

Waitingforcode

MARCH 23, 2023

Welcome to the 2nd part of the series with great streaming and project organization blog posts summaries!

Data Engineering

Data Engineering Data Engineer Engineering Project

A Complete Collection of Data Science Free Courses – Part 2

KDnuggets

MARCH 29, 2023

The second part covers the list of Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, Data Engineering, and MLOps.

Data Science

Data Science Deep Learning Machine Learning Data Engineering

Exploring The Nuances Of Building An Intential Data Culture

Data Engineering Podcast

MARCH 5, 2023

Summary The ecosystem for data professionals has matured to the point that there are a large and growing number of distinct roles. With the scope and importance of data steadily increasing it is important for organizations to ensure that everyone is aligned and operating in a positive environment. To help facilitate the nascent conversation about what constitutes an effective and productive data culture, the team at Data Council have dedicated an entire conference track to the subject.

Building

Building Database Design Machine Learning Metadata

Lyft in Trouble

The Pragmatic Engineer

MARCH 30, 2023

Originally published on 30 March 2023. 👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get full issues twice a week, subscribe here. Disclaimer: I worked at Uber, Lyft's US competitor, between 2016-2020. As always, I aim to remain independent in my analysis: I hold no positions in any of the companies mentioned in this article, and have not been paid to write ab

Software Engineer

Software Engineer Software Engineering Retail Media

Hello Dolly: Democratizing the magic of ChatGPT with open models

databricks

MARCH 23, 2023

Summary We show that anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following.

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Announcing FawltyDeps - a dependency checker for your Python code

Tweag

MARCH 13, 2023

It is a truth universally acknowledged that the Python packaging ecosystem is in need of a good dependency checker. In the least, it’s our hope to convince you that Tweag’s new dependency checker, FawltyDeps, can help you maintain an environment that is minimal and reproducible for your Python project, by ensuring that required dependencies are explicitly declared and detecting unused dependencies.

Python

Python Coding Project Systems

A Complete Collection of Data Science Free Courses – Part 1

KDnuggets

MARCH 21, 2023

The first part covers the list of Programming, Web scraping, Statistics & Probability, Data Analytics, SQL, and Business Intelligence free courses.

Data Science

Data Science Business Intelligence SQL Data Analytics

Introducing Compute-Compute Separation for Real-Time Analytics

Rockset

MARCH 1, 2023

Every database built for real-time analytics has a fundamental limitation. When you deconstruct the core database architecture, deep in the heart of it you will find a single component that is performing two distinct competing functions: real-time data ingestion and query serving. These two parts running on the same compute unit is what makes the database real-time: queries can reflect the effect of the new data that was just ingested.

Data Ingestion

Data Ingestion Database Architecture SQL

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Is there a drop in software engineer job openings, globally?

The Pragmatic Engineer

MARCH 23, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get full issues twice a week, subscribe here. There’s plenty of news and anecdotal evidence suggesting the jobs market for software engineers is cooling. In October 2022, I wrote about the start of a Big Tech hiring slowdown.

Software Engineer

Software Engineer Software Engineering Engineering Recruitment

SimulatedRides: How Lyft uses load testing to ensure reliable service during peak events

Lyft Engineering

MARCH 27, 2023

Authors: Remco van Bree , Ben Radler Contributors : Alex Ilyenko , Ben Radler , Francisco Souza , Garrett Heel , Nathan Hsieh , Remco van Bree , Shu Zheng , Alex Hartwell , Brian Witt “Load testing in production is great.” We know what you’re thinking — testing in production is one of the cardinal sins of software development. However, at Lyft we have come to realize that load testing in production is a powerful tool to prepare systems for unexpected bursty traffic and peak events.

Coding

Coding Database Systems Engineering

Announcing Topiary

Tweag

MARCH 8, 2023

Topiary aims to be a universal formatter engine within the Tree-sitter ecosystem. Named after the art of clipping or trimming trees into fantastic shapes, it is designed for formatter authors and formatter users: Authors can create a formatter for a language without having to write their own formatting engine, or even their own parser. Users benefit from uniform, comparable code style, across multiple languages, with the convenience of a single formatter tool.

Coding

Coding Engineering Designing Programming

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

GPT-4: Everything You Need To Know

KDnuggets

MARCH 15, 2023

A new model by OpenAI with improved natural language generation and understanding capabilities.

Process

Snowflake Connector for ServiceNow Available in Public Preview

Snowflake

MARCH 16, 2023

ServiceNow, Inc. offers a well-known SaaS application, with companies in multiple industries using it to help manage digital workloads for a variety of departments and operations. What if it was as easy as just a few clicks to get ServiceNow data directly into your Snowflake account so you could combine it with other data sources, including ERPs, HRs, and CRMs?

Hospitality

Hospitality BI Finance Government

Big Tech job-switching stats

The Pragmatic Engineer

MARCH 3, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics from The Scoop #39 , published two weeks ago, 23 February. To get full newsletters twice a week, subscribe here. I have collaborated with a tech recruiter - they’ve asked to be anonymous - who’s been running some very interesting queries on LinkedIn for software engineers.

Recruitment

Recruitment Software Engineer Software Engineering Finance

Uniting the Machine Learning and Data Streaming Ecosystems - Part 1

Confluent

MARCH 28, 2023

The future of data is real time and enriched by machine learning. How can we overcome socio-technical blockers and unite the ML and data streaming markets?

Machine Learning

Machine Learning Data

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Using CockroachDB to Reduce Feature Store Costs by 75%

DoorDash Engineering

MARCH 21, 2023

While building a feature store to handle the massive growth of our machine-learning (“ML”) platform, we learned that using a mix of different databases can yield significant gains in efficiency and operational simplicity. We saw that using Redis for our online machine-learning storage was not efficient from a maintenance and cost perspective.

Machine Learning

Machine Learning AWS Database Utilities

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Snowflake

MARCH 2, 2023

Snowflake enables organizations to be data-driven by offering an expansive set of features for creating performant, scalable, and reliable data pipelines that feed dashboards, machine learning models, and applications. But before data can be transformed and served or shared, it must be ingested from source systems. The volume of data generated in real time from application databases, sensors, and mobile devices continues to grow exponentially.

Kafka

Kafka Data Ingestion Data Pipeline Cloud Storage

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

The Collapse of Silicon Valley Bank

The Pragmatic Engineer

MARCH 13, 2023

It’s been a wild weekend, starting Friday. In case you somehow missed it: we went through the fastest bank run in history, in an event that impacted about half of all VC-funded startups in the US and UK. On Friday night, Silicon Valley Bank (SVB) was shut down by regulators, triggering a weekend of fear and uncertainty for many people and businesses with questions like: “can we make payroll next week?

Banking

Banking Insurance Portfolio Media

Table file formats - Z-Order compaction: Delta Lake

Waitingforcode

MARCH 29, 2023

In my recent exploration of the compaction, aka OPTIMIZE command, in Delta Lake, I found this famous Z-Ordering mode. It was one of the most outstanding features when I first heard about Delta Lake. You can't even imagine how impatient I was to see what it is doing under-the-hood. Fortunately, this time has come!

Complete Guide to Pub/Sub in Redis

Analytics Vidhya

MARCH 31, 2023

Introduction Publish and Subscribe is a messaging mechanism having one or a set of senders sending messages and one or a group of receivers receiving these messages. These senders are called Publishers, responsible for publishing these messages, and the receivers are called Subscribers who subscribe to these Publishers to receive their notifications.

Cloud Computing

Cloud Computing Data Analysis Data Warehouse Data Engineering

Polars vs Spark. Real Talk.

Confessions of a Data Guy

MARCH 28, 2023

Real talk. Polars is all the rage. People love Spark. People use Spark for small data, but data is too big for Pandas. Spark runs on a local machine. Polars runs on a local machine. What do I choose, Spark or Polars? Does it matter? I’ve written about Polars at different points, here, and here […] The post Polars vs Spark. Real Talk. appeared first on Confessions of a Data Guy.

IT Data

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

March, 2023

5 Machine Learning Skills Every Machine Learning Engineer Should Know in 2023

5 Advance Projects for Data Science Portfolio

Webinars

Trending Sources

AWS Lambdas. Useful for Data Engineering?

Webinars

Amazon doubling down on return to office

A Guide to Debugging Apache Airflow® DAGs

How to get started with dbt

Advanced NumPy: Broadcasting and Strides

Worth reading for data engineers - part 2

Sign up to get articles personalized to your interests!

More Trending

Worth reading for data engineers - part 2

A Complete Collection of Data Science Free Courses – Part 2

Exploring The Nuances Of Building An Intential Data Culture

Lyft in Trouble

Hello Dolly: Democratizing the magic of ChatGPT with open models

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Top 6 Amazon S3 Interview Questions

Announcing FawltyDeps - a dependency checker for your Python code

A Complete Collection of Data Science Free Courses – Part 1

Introducing Compute-Compute Separation for Real-Time Analytics

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Is there a drop in software engineer job openings, globally?

SimulatedRides: How Lyft uses load testing to ensure reliable service during peak events

Top 6 Cassandra Interview Questions

Announcing Topiary

How to Modernize Manufacturing Without Losing Control

GPT-4: Everything You Need To Know

Snowflake Connector for ServiceNow Available in Public Preview

Big Tech job-switching stats

Uniting the Machine Learning and Data Streaming Ecosystems - Part 1

Optimizing The Modern Developer Experience with Coder

Top 6 Microsoft HDFS Interview Questions

Using CockroachDB to Reduce Feature Store Costs by 75%

Top Machine Learning Papers to Read in 2023

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

15 Modern Use Cases for Enterprise Business Intelligence

The Collapse of Silicon Valley Bank

Table file formats - Z-Order compaction: Delta Lake

Complete Guide to Pub/Sub in Redis

Polars vs Spark. Real Talk.

The Ultimate Guide to Apache Airflow DAGS

Stay Connected