Top Data Engineering Digest Data Lake Project Content for Week of Jan 22

Sat.Jan 22, 2022 - Fri.Jan 28, 2022

The Best Python Courses: An Analysis Summary

KDnuggets

JANUARY 24, 2022

What does the data reveal if we ask: "What are the 10 Best Python Courses?". Collecting almost all of the courses from top platforms shows there are plenty to choose from, with over 3000 offerings. This article summarizes my analysis and presents the top three courses.

Python

Python Data

Building an Analytics API with GraphQL: The Next Level of Data Engineering?

Simon Späti

JANUARY 22, 2022

Image by Mohammad Bagher Adib Behrooz on Unsplash Why GraphQL for data engineers, you might ask? GraphQL solved the problem of providing a distinct interface for each client by unifying it to a single API for all clients such as web, mobile, web apps. The same challenge we’re now facing in the data world, where we integrate multiple clients with numerous backend systems.

Data Engineering

Data Engineering Data Engineer Engineering Building

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Trending Sources

Why Choose a Hybrid Data Cloud in Financial Services?

Cloudera

JANUARY 28, 2022

As I meet with our customers, there are always a range of discussions regarding the use of the cloud for financial services data and analytics. Customers vary widely on the topic of public cloud – what data sources, what use cases are right for public cloud deployments – beyond sandbox, experimentation efforts. Private cloud continues to gain traction with firms realizing the benefits of greater flexibility and dynamic scalability.

Cloud

Cloud Banking Data Governance Government

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

What’s New in Apache Kafka 3.1.0

Confluent

JANUARY 24, 2022

On behalf of the Apache Kafka® community, it is my pleasure to announce the release of Apache Kafka 3.1.0. The 3.1.0 release contains many improvements and new features. We’ll highlight […].

Kafka

Kafka IT

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Speaker: Jason Chester, Director, Product Management

In today’s manufacturing landscape, staying competitive means moving beyond reactive quality checks and toward real-time, data-driven process control. But what does true manufacturing process optimization look like—and why is it more urgent now than ever? Join Jason Chester in this new, thought-provoking session on how modern manufacturers are rethinking quality operations from the ground up.

Manufacturing

3 Reasons Why Data Scientists Should Use LightGBM

KDnuggets

JANUARY 24, 2022

There are many great boosting Python libraries for data scientists to reap the benefits of. In this article, the author discusses LightGBM benefits and how they are specific to your data science job.

Data Science

Data Science Python Data Machine Learning

Three Ways Integrated Data Can Deliver Outstanding Customer Experience

Teradata

JANUARY 25, 2022

The use of integrated data to restore customer confidence will be big in 2022. Building a customer insights foundation should be high on the to-do list for retail & CPG businesses this year.

Retail

Retail Data Building

Netflix: A Culture of Learning

Netflix Tech

JANUARY 25, 2022

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , Colin McFarland , Mihir Tendulkar , and Travis Brooks This is the last post in an overview series on experimentation at Netflix. Need to catch up? Earlier posts covered the basics of A/B tests ( Part 1 and Part 2 ), core statistical concepts ( Part 3 and Part 4 ), how to build confidence in a decision ( Part 5 ), and the the role of Experimentation and A/B testing within the larger Data Science and Engineering organization at N

Education

Education Entertainment Programming Designing

More Trending

Netflix: A Culture of Learning

Netflix Tech

JANUARY 25, 2022

Education

Education Entertainment Programming Designing

The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam

Data Engineering Podcast

JANUARY 23, 2022

Summary Data platforms are exemplified by a complex set of connections that are subject to a set of constantly evolving requirements. In order to make this a tractable problem it is necessary to define boundaries for communication between concerns, which brings with it the need to establish interface contracts for communicating across those boundaries.

Data Integration

Data Integration Data Pipeline Data Architecture

Getting Started Cleaning Data

KDnuggets

JANUARY 26, 2022

In order to achieve quality data, there is a process that needs to happen. That process is data cleaning. Learn more about the various stages of this process.

Data

Data Process Data Science

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

Analytics are prone to frequent data errors and deployment of analytics is slow and laborious. The strategic value of analytics is widely recognized, but the turnaround time of analytics teams typically can’t support the decision-making needs of executives coping with fast-paced market conditions. Perhaps it is no surprise that the average tenure of a CDO or CAO is only about 2.5 years.

Consulting

Consulting Recruitment Data Lake Data Engineer

It’s Time to Look Forward

Cloudera

JANUARY 25, 2022

The start of a new year is a perfect time to reflect on what was accomplished and look forward, re-evaluate what we can do better. Change, although difficult at first, can also be very rewarding. That’s why I was excited to see similar sentiments shared at Thoughtspot beyond.2021 to move beyond the traditional dashboards of the past. As roles within organizations evolve (as seen by the growth of citizen scientists and analytics engineers) and as data needs change (think schema changes and real-

Datasets

Datasets Machine Learning Accessibility Accessible

Airflow Best Practices for ETL/ELT Pipelines

Speaker: Kenten Danas, Senior Manager, Developer Relations

ETL and ELT are some of the most common data engineering use cases, but can come with challenges like scaling, connectivity to other systems, and dynamically adapting to changing data sources. Airflow is specifically designed for moving and transforming data in ETL/ELT pipelines, and new features in Airflow 3.0 like assets, backfills, and event-driven scheduling make orchestrating ETL/ELT pipelines easier than ever!

Data Engineering

Building And Managing Data Teams And Data Platforms In Large Organizations With Ashish Mrig

Data Engineering Podcast

JANUARY 23, 2022

Summary Data engineering is a relatively young and rapidly expanding field, with practitioners having a wide array of experiences as they navigate their careers. Ashish Mrig currently leads the data analytics platform for Wayfair, as well as running a local data engineering meetup. In this episode he shares his career journey, the challenges related to management of data professionals, and the platform design that he and his team have built to power analytics at a large company.

Management

Management Building Metadata Data Pipeline

Deep Learning with Python: Second Edition by François Chollet

KDnuggets

JANUARY 25, 2022

Now in print! New edition of the bestselling original by François Chollet.

Deep Learning

Deep Learning Python

Fixing Performance Regressions Before they Happen

Netflix Tech

JANUARY 24, 2022

Angus Croll Netflix is used by 222 million members and runs on over 1700 device types ranging from state-of-the-art smart TVs to low-cost mobile devices. At Netflix we’re proud of our reliability and we want to keep it that way. To that end, it’s important that we prevent significant performance regressions from reaching the production app. Sluggish scrolling or late rendering is frustrating and triggers accidental navigations.

Coding

Coding Building Utilities Python

96 Percent of Businesses Can’t Be Wrong: How Hybrid Cloud Came to Dominate the Data Sector

Cloudera

JANUARY 26, 2022

According to 451 Research , 96% of enterprises are actively pursuing a hybrid IT strategy. Modern, real-time businesses require accelerated cycles of innovation that are expensive and difficult to maintain with legacy data platforms. Cloud technologies and respective service providers have evolved solutions to address these challenges. . The hybrid cloud’s premise—two data architectures fused together—gives companies options to leverage those solutions and to address decision-making criteria, on

Cloud

Cloud Cloud Computing Hadoop Data Warehouse

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data Workflow

AWS and Confluent Announce Deepened Strategic Collaboration

Confluent

JANUARY 27, 2022

Today we’re announcing an exciting Strategic Collaboration Agreement (SCA) with Amazon Web Services (AWS). This new five-year agreement builds on our strong existing collaboration, with the goal of making it […].

AWS

AWS Amazon Web Services Building IT

What to Expect From Your Career Path as a Data Scientist

KDnuggets

JANUARY 28, 2022

Learn about the roles between you and the Director of Data Science.

Data Science

Data Science Data

Analytics Engineer: Job Description, Skills, and Responsibilities

AltexSoft

JANUARY 26, 2022

In recent years, it’s getting more common to see organizations looking for a mysterious analytics engineer. As you may guess from the name, this role sits somewhere in the middle of a data analyst and data engineer, but it’s really neither one nor the other. Quoting a comment from the Reddit discussion , “Their [analytics engineers] job is to marry the technical requirements of the data stack with the business objectives”.

Engineering

Engineering Software Engineer Software Engineering Data Warehouse

Customizing Personal Lines Insurance with Location Data

Cloudera

JANUARY 27, 2022

Insurers are increasingly adopting data from smart devices and related technologies to support and service their customers better. According to Statista , the projected installed base of IOT devices is expected to increase to 30.9 billion units by 2025, a huge jump from the 13.8 billion units that exist today. I have been researching more about how we can use the new data from those devices to design more innovative insurance products while being aware that these should all be contingent upon cu

Insurance

Insurance Data Government Data Governance

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Running an NGINX Ingress Controller for each Kubernetes Namespace

Hepta Analytics

JANUARY 28, 2022

You may find yourself needing to deploy multiple NGINX Ingress Controllers to serve each namespace on your Kubernetes cluster. This may be useful in a scenario where you have multiple client deployments on the same K8S cluster; and you want to assign a public load balancer IP address for each client to achieve logical separation. This blogpost explores how to do that.

Metadata

Metadata Cloud Process Cloud Computing

How to Set Up Your Data Science Stack on a Budget

KDnuggets

JANUARY 25, 2022

Whether you’re working independently or setting up a stack for a company, you need an affordable stack option. Here’s how you can set up your stack without spending too much.

Data Science

Data Science Data

How to do Anomaly Detection using Machine Learning in Python?

ProjectPro

JANUARY 28, 2022

In data science, algorithms are usually designed to detect and follow trends found in the given data. The modeling follows from the data distribution learned by the statistical or neural model. In real life, the features of data points in any given domain occur within some limits. They will only go outside of these expected patterns in exceptional cases, which are usually erroneous or fraudulent.

Machine Learning

Machine Learning Python Algorithm Datasets

Case Study: Real-Time Insights Help Propel 10X Growth at E-Learning Provider Seesaw

Rockset

JANUARY 28, 2022

Seesaw Learning Inc. provides a leading online student learning platform used by more than 10 million K-12 teachers, students and family members in the U.S. every month. The San Francisco company has grown steadily since its founding in 2013, with its hosted service in use in 75% of American schools and in another 150 countries. Of course, when COVID-19 hit in early 2020 and forced schools to abruptly switch to full-time remote learning, the need for Seesaw’s platform skyrocketed.

NoSQL

NoSQL MongoDB PostgreSQL ETL Tools

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

What is Data Engineering? Skills, Tools, and Certifications

Cloud Academy

JANUARY 27, 2022

Data engineering is the process of designing and implementing solutions to collect, store, and analyze large amounts of data. This process is generally called “Extract, Transfer, Load” or ETL. The data then gets prepared in formats to be used by people such as business analysts, data analysts, and data scientists. The format of the data will be different depending on the intended audience.

Certification

Certification Data Engineer Data Engineering Engineering

Learn Machine Learning 4X Faster by Participating in Competitions

KDnuggets

JANUARY 25, 2022

Participating in competitions has taught me everything about machine learning and how It can help you learn multiple domains faster than online courses.

Machine Learning

Machine Learning IT

Apache Superset 1.4: Release Notes

Preset

JANUARY 27, 2022

Apache Superset 1.4 is now out! This version contains the most number of bug fixes in recent history, a variety of UX improvements, and improved database support.

Database

Credit Risk Reloaded For A Modern World

Teradata

JANUARY 27, 2022

The prevalence of new business models, emerging global risks & modernization of data processing in the cloud is ushering in a new era for credit risk management & the transformation of risk analytics.

Cloud

Cloud Data Process Management Process

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

Speaker: Tamara Fingerlin, Developer Advocate

Data Workflow

Data Hierarchy of Needs

Grouparoo

JANUARY 26, 2022

In psychology, there is a famous construct created by Abraham Maslow called the hierarchy of needs. Put simply, it says that people must first satisfy their basic needs before they can progress to focusing on more nuanced goals. It’s often shown as a pyramid where each need builds on top of the previous one. The goal, of course, is to reach the top.

Data Warehouse

Data Warehouse Food ETL Tools Raw Data

TensorFlow for Computer Vision – Transfer Learning Made Easy

KDnuggets

JANUARY 25, 2022

In this article, see how you can get above 90% accuracy on the validation set with a pretty straightforward approach. You'll also see what happens to the validation accuracy if we scale down the amount of training data by a factor of 20. Spoiler alert - it will remain unchanged.

IT Data

They Hit The Jackpot: They Indeed Found The Best Program To Master Data Science

U-Next

JANUARY 26, 2022

Before we go on to explain why they made the best decisions and how they have found their ‘Happily Ever After’ in the career with our program, here are some fun facts about the booming Data Science domain – According to Globe Newswire , The global predictive analytics market is expected to become 21.5 billion USD by 2025, growing at a CAGR of 24.5%.

Data Science

Data Science Programming Education Machine Learning

A Beginner’s Guide to Learning PySpark for Big Data Processing

ProjectPro

JANUARY 25, 2022

Did you know that, according to Linkedin, over 24,000 Big Data jobs in the US list Apache Spark as a required skill? Learning Spark has become more of a necessity to enter the Big Data industry. One of the most in-demand technical skills these days is analyzing large data sets, and Apache Spark and Python are two of the most widely used technologies to do this.

Big Data

Big Data Data Process Process Kafka

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Sat.Jan 22, 2022 - Fri.Jan 28, 2022

The Best Python Courses: An Analysis Summary

Building an Analytics API with GraphQL: The Next Level of Data Engineering?

Webinars

Trending Sources

Why Choose a Hybrid Data Cloud in Financial Services?

Webinars

What’s New in Apache Kafka 3.1.0

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

3 Reasons Why Data Scientists Should Use LightGBM

Three Ways Integrated Data Can Deliver Outstanding Customer Experience

Netflix: A Culture of Learning

Sign up to get articles personalized to your interests!

More Trending

Netflix: A Culture of Learning

The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam

Getting Started Cleaning Data

Fire Your Super-Smart Data Consultants with DataOps

It’s Time to Look Forward

Airflow Best Practices for ETL/ELT Pipelines

Building And Managing Data Teams And Data Platforms In Large Organizations With Ashish Mrig

Deep Learning with Python: Second Edition by François Chollet

Fixing Performance Regressions Before they Happen

96 Percent of Businesses Can’t Be Wrong: How Hybrid Cloud Came to Dominate the Data Sector

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

AWS and Confluent Announce Deepened Strategic Collaboration

What to Expect From Your Career Path as a Data Scientist

Analytics Engineer: Job Description, Skills, and Responsibilities

Customizing Personal Lines Insurance with Location Data

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Running an NGINX Ingress Controller for each Kubernetes Namespace

How to Set Up Your Data Science Stack on a Budget

How to do Anomaly Detection using Machine Learning in Python?

Case Study: Real-Time Insights Help Propel 10X Growth at E-Learning Provider Seesaw

How to Modernize Manufacturing Without Losing Control

What is Data Engineering? Skills, Tools, and Certifications

Learn Machine Learning 4X Faster by Participating in Competitions

Apache Superset 1.4: Release Notes

Credit Risk Reloaded For A Modern World

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

Data Hierarchy of Needs

TensorFlow for Computer Vision – Transfer Learning Made Easy

They Hit The Jackpot: They Indeed Found The Best Program To Master Data Science

A Beginner’s Guide to Learning PySpark for Big Data Processing

A Guide to Debugging Apache Airflow® DAGs

Stay Connected