Top Data Engineering Digest Non-relational Database Relational Database Content for November, 2021

November, 2021

Why Machine Learning Engineers are Replacing Data Scientists

KDnuggets

NOVEMBER 29, 2021

The hiring run for data scientists continues along at a strong clip around the world. But, there are other emerging roles that are demonstrating key value to organizations that you should consider based on your existing or desired skill sets.

Machine Learning

Machine Learning Engineering Data

How Uber Migrated Financial Data from DynamoDB to Docstore

Uber Engineering

NOVEMBER 10, 2021

Introduction. Each day, Uber moves millions of people around the world and delivers tens of millions of food and grocery orders. This generates a large number of financial transactions that need to be stored with provable completeness, consistency, and compliance. … The post How Uber Migrated Financial Data from DynamoDB to Docstore appeared first on Uber Engineering Blog.

Food

Food Data Engineering Systems

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Scaling Apache Druid for Real-Time Cloud Analytics at Confluent

Confluent

NOVEMBER 8, 2021

How does Confluent provide fine-grained operational visibility to our customers throughout all of the multi-tenant services that we run in the cloud? At Confluent Cloud, we manage a large number […].

Cloud

Cloud Management

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Azure Data Factory: Fail Activity

Azure Data Engineering

NOVEMBER 21, 2021

During some scenarios in Azure Data Factory, we may want to intentionally stop the execution of the pipeline. An example could be when we want to check the existence of a file or folder using Get Metadata activity. We may want to fail the pipeline if the file/folder does not exist. To achieve this, we could use the Fail Activity. Invoking the Fail Activity ensures that the pipeline execution will be stopped.

Metadata

Metadata Data Utilities Coding

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Setting up end-to-end tests for cloud data pipelines

Start Data Engineering

NOVEMBER 11, 2021

1. Introduction 2. Setting up services locally 3. Writing an end-to-end data pipeline test 4. Conclusion 5. Further reading 6. References 1. Introduction Data pipelines can have multiple software components. This makes testing all of them together difficult. If you are wondering What is the best way to end-to-end test data pipelines? Are end-to-end tests worth the effort?

Data Pipeline

Data Pipeline Cloud Data

Airflow Timetable: Schedule your DAGs like never before

Marc Lamberti

NOVEMBER 2, 2021

Airflow Timetable. This new concept introduced in Airflow 2.2 is going to change your way of scheduling your data pipelines. Or I would say, you’re finally going to have all the freedom and flexibility you ever dreamt of for scheduling your DAGs. What if you want to run your DAG for specific schedule intervals with “holes” in between?

Data Pipeline

Data Pipeline Coding Process IT

Sentiment Analysis with KNIME

KDnuggets

NOVEMBER 29, 2021

Check out this tutorial on how to approach sentiment classification with supervised machine learning algorithms.

Algorithm

Algorithm Machine Learning

More Trending

Sentiment Analysis with KNIME

KDnuggets

NOVEMBER 29, 2021

Check out this tutorial on how to approach sentiment classification with supervised machine learning algorithms.

Algorithm

Algorithm Machine Learning

New Applied ML Prototypes Now Available in Cloudera Machine Learning

Cloudera

NOVEMBER 17, 2021

It’s no secret that Data Scientists have a difficult job. It feels like a lifetime ago that everyone was talking about data science as the sexiest job of the 21st century. Heck, it was so long ago that people were still meeting in person! Today, the sexy is starting to lose its shine. There’s recognition that it’s nearly impossible to find the unicorn data scientist that was the apple of every CEO’s eye in 2012.

Machine Learning

Machine Learning Algorithm Data Science Retail

The Future of SQL: Databases Meet Stream Processing

Confluent

NOVEMBER 3, 2021

SQL has proven to be an invaluable asset for most software engineers building software applications. Yet, the world as we know it has changed dramatically since SQL was created in […].

SQL

SQL Database Process Software Engineer

Azure Data Factory: Wait Activity

Azure Data Engineering

NOVEMBER 15, 2021

In one of the previous posts, we discussed how we can use Validation activity to design the Pipeline to wait for a scheduled time and retry. There is another way to introduce a delay in the Pipeline. Wait activity can be used to pause the execution of the Pipeline for a fixed amount of time. Sometimes, we come across scenarios where we would like the execution for the Pipeline to be Paused for some time but not cancelled.

Data

Data Designing

The Benefits and Drawbacks of DataOps in Practice

DataKitchen

NOVEMBER 12, 2021

The post The Benefits and Drawbacks of DataOps in Practice first appeared on DataKitchen.

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Ten Things I’ve Learned in 20 Years in Data and Analytics

Teradata

NOVEMBER 22, 2021

Teradata's Martin Willcox recently passed 17 years at Teradata and a quarter of a century in the industry. Here are the ten things he's learned about data analytics in those 20-odd years.

Data Analytics

Data Analytics Data

How to Build a Knowledge Graph with Neo4J and Transformers

KDnuggets

NOVEMBER 26, 2021

Learn to use custom Named Entity Recognition and Relation Extraction models.

Building

Make Your Models Matter: What It Takes to Maximize Business Value from Your Machine Learning Initiatives

Cloudera

NOVEMBER 19, 2021

We are excited by the endless possibilities of machine learning (ML). We recognise that experimentation is an important component of any enterprise machine learning practice. But, we also know that experimentation alone doesn’t yield business value. Organizations need to usher their ML models out of the lab (i.e., the proof-of-concept phase) and into deployment, which is otherwise known as being “in production”. .

Machine Learning

Machine Learning IT Project Data Engineering

How Do You Change a Never-Ending Query?

Confluent

NOVEMBER 5, 2021

There’s a philosophical puzzle of the Ship of Theseus where throughout a long voyage planks in a ship are individually replaced as they begin to rot. At the end, there […].

Process

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Azure Data Factory: Filter Activity

Azure Data Engineering

NOVEMBER 10, 2021

In the previous post, we discussed the Switch Activity , which is useful for branching the control flow based on some condition. We will discuss about the Filter Activity in this post. The purpose of Filter Activity is to process array items based on some condition. Consider a scenario where we would like to set the value of a variable to the current array item that satisfies some business rule or condition.

SQL

SQL Data Coding Designing

Doing DataOps For External Data Sources As A Service at Demyst

Data Engineering Podcast

NOVEMBER 27, 2021

Summary The data that you have access to affects the questions that you can answer. By using external data sources you can drastically increase the range of analysis that is available to your organization. The challenge comes in all of the operational aspects of finding, accessing, organizing, and serving that data. In this episode Mark Hookey discusses how he and his team at Demyst do all of the DataOps for external data sources so that you don’t have to, including the systems necessary t

Data Warehouse

Data Warehouse Data Lake BI Business Intelligence

A Systematic Approach to Reducing Technical Debt

Zalando Engineering

NOVEMBER 29, 2021

Introduction While technical debt is a recurring issue in software engineering, the case of the Merchant Orders team within Zalando Direct was a an outlier as, due to a lack of a clearly defined process, technical debt more or less only ever accumulated. When I joined this team in autumn 2020 as its new engineering lead, the technical debt backlog had entries dating back to 2018.

Software Engineering

Software Engineering Software Engineer Engineering Scala

Most Common SQL Mistakes on Data Science Interviews

KDnuggets

NOVEMBER 23, 2021

Sure, we all make mistakes -- which can be a bit more painful when we are trying to get hired -- so check out these typical errors applicants make while answering SQL questions during data science interviews.

Data Science

Data Science SQL Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Switching from CPUs to GPUs for NYC Taxi Fare Predictions with NVIDIA RAPIDS

Cloudera

NOVEMBER 3, 2021

Have you ever asked a data scientist if they wanted their code to run faster? You would probably get a more varied response asking if the earth is flat. It really isn’t any different from anything else in tech, faster is almost always better. One of the best ways to make a substantial improvement in processing time is to, if you haven’t already, switched from CPUs to GPUs.

Deep Learning

Deep Learning Data Science Machine Learning Python

Readings in Streaming Database Systems

Confluent

NOVEMBER 2, 2021

What will the next important category of databases look like? For decades, relational databases were the undisputed home of data. They powered everything: from websites to analytics, from customer data […].

Database

Database Systems Relational Database Data

10 DataOps Principles for Overcoming Data Engineer Burnout

DataKitchen

NOVEMBER 18, 2021

For several years now, the elephant in the room has been that data and analytics projects are failing. Gartner estimated that 85% of big data projects fail. Data from New Vantage partners showed that the number of data-driven organizations has actually declined to 24% from 37% several years ago and that only 29% of organizations are achieving transformational outcomes from their data. .

Data Engineer

Data Engineer Data Engineering Engineering Government

Creating A Unified Experience For The Modern Data Stack At Mozart Data

Data Engineering Podcast

NOVEMBER 27, 2021

Summary The modern data stack has been gaining a lot of attention recently with a rapidly growing set of managed services for different stages of the data lifecycle. With all of the available options it is possible to run a scalable, production grade data platform with a small team, but there are still sharp edges and integration challenges to work through.

BI Data Data Warehouse Data Engineering

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Bringing AV1 Streaming to Netflix Members’ TVs

Netflix Tech

NOVEMBER 9, 2021

by Liwei Guo , Ashwin Kumar Gopi Valliammal , Raymond Tam , Chris Pham , Agata Opalach , Weibo Ni AV1 is the first high-efficiency video codec format with a royalty-free license from Alliance of Open Media (AOMedia), made possible by wide-ranging industry commitment of expertise and resources. Netflix is proud to be a founding member of AOMedia and a key contributor to the development of AV1.

Media

Media Software Engineering Software Engineer Data Science

3 Differences Between Coding in Data Science and Machine Learning

KDnuggets

NOVEMBER 19, 2021

The terms ‘data science’ and ‘machine learning’ are often used interchangeably. But while they are related, there are some glaring differences, so let’s take a look at the differences between the two disciplines, specifically as it relates to programming.

Machine Learning

Machine Learning Data Science Coding Programming

NiFi as a Function in DataFlow Service

Cloudera

NOVEMBER 16, 2021

Introduction. With the general availability of Cloudera DataFlow for the Public Cloud (CDF-PC) , our customers can now self-serve deployments of Apache NiFi data flows on Kubernetes clusters in a cost effective way providing auto scaling, resource isolation and monitoring with KPI-based alerting. You can find more information in this release announcement blog post and in this technical deep dive blog post.

Google Cloud

Google Cloud AWS Cloud Computing Kafka

How to Efficiently Subscribe to a SQL Query for Changes

Confluent

NOVEMBER 19, 2021

Imagine that you have real-time data about what’s happening in the stock market, and you want to support a large number of customized dashboards displaying the data as it comes […].

SQL

SQL IT Data Process

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

The vast majority of data engineers are burnt out. Those working in healthcare are no exception

DataKitchen

NOVEMBER 4, 2021

The post The vast majority of data engineers are burnt out. Those working in healthcare are no exception first appeared on DataKitchen.

Healthcare

Healthcare Data Engineer Data Engineering Engineering

Laying The Foundation Of Your Data Platform For The Era Of Big Complexity With Dagster

Data Engineering Podcast

NOVEMBER 20, 2021

Summary The technology for scaling storage and processing of data has gone through massive evolution over the past decade, leaving us with the ability to work with massive datasets at the cost of massive complexity. Nick Schrock created the Dagster framework to help tame that complexity and scale the organizational capacity for working with data. In this episode he shares the journey that he and his team at Elementl have taken to understand the state of the ecosystem and how they can provide a f

Data Warehouse

Data Warehouse Data Lake BI Business Intelligence

Building confidence in a decision

Netflix Tech

NOVEMBER 15, 2021

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , Michael Lindon , and Colin McFarland This is the fifth post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. Need to catch up? Have a look at Part 1 (Decision Making at Netflix), Part 2 (What is an A/B Test?), Part 3 (False positives and statistical significance), and Part 4 (False negatives and power).

Building

Building Utilities Designing Coding

Stop Blaming Humans for Bias in AI

KDnuggets

NOVEMBER 19, 2021

Can artificial intelligence be rid of bias? This is an important question, and it’s equally important that we look in the right place for the answer.

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

November, 2021

Why Machine Learning Engineers are Replacing Data Scientists

How Uber Migrated Financial Data from DynamoDB to Docstore

Webinars

Trending Sources

Scaling Apache Druid for Real-Time Cloud Analytics at Confluent

Webinars

Azure Data Factory: Fail Activity

A Guide to Debugging Apache Airflow® DAGs

Setting up end-to-end tests for cloud data pipelines

Airflow Timetable: Schedule your DAGs like never before

Sentiment Analysis with KNIME

Sign up to get articles personalized to your interests!

More Trending

Sentiment Analysis with KNIME

New Applied ML Prototypes Now Available in Cloudera Machine Learning

The Future of SQL: Databases Meet Stream Processing

Azure Data Factory: Wait Activity

The Benefits and Drawbacks of DataOps in Practice

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Ten Things I’ve Learned in 20 Years in Data and Analytics

How to Build a Knowledge Graph with Neo4J and Transformers

Make Your Models Matter: What It Takes to Maximize Business Value from Your Machine Learning Initiatives

How Do You Change a Never-Ending Query?

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Azure Data Factory: Filter Activity

Doing DataOps For External Data Sources As A Service at Demyst

A Systematic Approach to Reducing Technical Debt

Most Common SQL Mistakes on Data Science Interviews

How to Modernize Manufacturing Without Losing Control

Switching from CPUs to GPUs for NYC Taxi Fare Predictions with NVIDIA RAPIDS

Readings in Streaming Database Systems

10 DataOps Principles for Overcoming Data Engineer Burnout

Creating A Unified Experience For The Modern Data Stack At Mozart Data

Optimizing The Modern Developer Experience with Coder

Bringing AV1 Streaming to Netflix Members’ TVs

3 Differences Between Coding in Data Science and Machine Learning

NiFi as a Function in DataFlow Service

How to Efficiently Subscribe to a SQL Query for Changes

15 Modern Use Cases for Enterprise Business Intelligence

The vast majority of data engineers are burnt out. Those working in healthcare are no exception

Laying The Foundation Of Your Data Platform For The Era Of Big Complexity With Dagster

Building confidence in a decision

Stop Blaming Humans for Bias in AI

The Ultimate Guide to Apache Airflow DAGS

Stay Connected