Top Data Engineering Digest High Quality Data Data Integration Content for Week of Dec 24

Sat.Dec 24, 2022 - Fri.Dec 30, 2022

Data Science Minimum: 10 Essential Skills You Need to Know to Start Doing Data Science

KDnuggets

DECEMBER 30, 2022

Data science is ever-evolving, so mastering its foundational technical and soft skills will help you be successful in a career as a Data Scientist, as well as pursue advance concepts, such as deep learning and artificial intelligence.

Data Science

Data Science Deep Learning Data IT

Should We Get Rid Of ETLs?

Seattle Data Guy

DECEMBER 29, 2022

AWS has jumped on the bandwagon of removing the need for ETLs. Snowflake announced this both with their hybrid tables and their partnership with Salesforce. Now, I do take a little issue with the naming “Zero ETLs”. Because at the very surface the functionality described is often closer to a zero integration future, which probably… Read more The post Should We Get Rid Of ETLs?

AWS

AWS Consulting Big Data Data

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

I asked ChatGPT to write a blog post about Data Engineering. Here it is.

Confessions of a Data Guy

DECEMBER 29, 2022

Data engineering is a vital field within the realm of data science that focuses on the practical aspects of collecting, storing, and processing large amounts of data. It involves designing and building the infrastructure to store and process data, as well as developing the tools and systems to extract valuable insights and knowledge from that […] The post I asked ChatGPT to write a blog post about Data Engineering.

Data Engineering

Data Engineering Data Engineer Engineering IT

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

Data Engineering Podcast

DECEMBER 29, 2022

Summary Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this problem, but they are frequently ineffective.

Management

Management Metadata Business Intelligence Data Lake

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

More Data Science Cheatsheets

KDnuggets

DECEMBER 30, 2022

It's time again to look at some data science cheatsheets. Here you can find a short selection of such resources which can cater to different existing levels of knowledge and breadth of topics of interest.

Data Science

Data Science Data IT

Data Catalog - A Broken Promise

Data Engineering Weekly

DECEMBER 29, 2022

Data catalogs are the most expensive data integration systems you never intended to build. Data Catalog as a passive web portal to display metadata requires significant rethinking to adopt modern data workflow, not just adding “modern” in its prefix. I know that is an expensive statement to make😊 To be fair, I’m a big fan of data catalogs, or metadata management , to be precise.

Metadata

Metadata Data Warehouse ETL Tools Data Workflow

What is Apache Arrow? Asking for a friend.

Confessions of a Data Guy

DECEMBER 27, 2022

We’ve all been in that spot, especially in tech. You wanted to fit in, be cool, and look smart, so you didn’t ask any questions. And now it’s too late. You’re stuck. Now you simply can’t ask … you’re too afraid. I get it. Apache Arrow is probably one of those things. It keeps popping […] The post What is Apache Arrow?

IT Data Big Data Data Engineer

More Trending

What is Apache Arrow? Asking for a friend.

Confessions of a Data Guy

DECEMBER 27, 2022

IT Data Big Data Data Engineer

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Data Engineering Podcast

DECEMBER 28, 2022

Summary With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term i

Data Lake

Data Lake Data Warehouse Data Pipeline MongoDB

Top 38 Python Libraries for Data Science, Data Visualization & Machine Learning

KDnuggets

DECEMBER 29, 2022

This article compiles the 38 top Python libraries for data science, data visualization & machine learning, as best determined by KDnuggets staff.

Machine Learning

Machine Learning Data Science Python Data

Looking to the Future – How a Data Operating System Breathes Life Into Healthcare

The Modern Data Company

DECEMBER 30, 2022

Looking to the Future – How a Data Operating System Breathes Life Into Healthcare Download (PDF) The post Looking to the Future – How a Data Operating System Breathes Life Into Healthcare appeared first on TheModernDataCompany.

Healthcare

Healthcare Systems Data Data Management

The Top Data Strategy Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 29, 2022

The Top Data Strategy Influencers and Content Creators on LinkedIn Eitan Chazbani 2022-12-29 14:08:41 What’s the latest in the data world? In a space that moves at a rapid-fire pace, keeping up with new trends and evolving best practices can be dizzying. But having the right network can make all the difference. Regularly following updates from leaders in the data strategy space can go a long way toward not only helping you stay up to date on the latest and greatest, but also allowing you to join

BI Consulting Data Science Data Governance

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Data Engineering Podcast

DECEMBER 25, 2022

Summary Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies.

Machine Learning

Machine Learning Systems Data Lake Data Warehouse

5 Tasks To Automate With Python

KDnuggets

DECEMBER 27, 2022

Here are 5 tasks you can automate with Python, and how to do it.

Python

Python IT

Building a Future in Banking and Capital Markets

The Modern Data Company

DECEMBER 29, 2022

Banking and Capital Markets are undergoing a period of transformation. The global economic outlook is somewhat fragile, but banks are in an excellent position to survive and thrive as long as they have the right tools in place. According to Deloitte’s report 2023 Banking and Capital Markets Outlook , banks must find ways to adapt to global disruption and understand the changing needs of consumers to find success.

Banking

Banking Pipeline-centric Building Finance

Top 5 Data Engineering Deep Dives in 2022

Monte Carlo

DECEMBER 29, 2022

No one wants to read marketing fluff, especially not data engineers. These builders and architects are prone to scoff at any article detailing concepts at a “high-level.” Everyone understands that data lineage and data pipeline monitoring are important, but the real question is, “how do you build it?” Caveat emptor, the following articles are for the technically inclined and definitely not for the faint of heart.

Data Engineer

Data Engineer Data Engineering Engineering Data Lake

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

Data Engineering Podcast

DECEMBER 25, 2022

Summary Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operating data systems at a variety of scales and for myriad purposes. In order to condense that acquired knowledge into a format that is useful to everyone Scott Hirleman turns the tables in this episode and asks Tobias about the tactical and strategic aspects of his experiences applying those lessons to the work of building a data platform from scratch.

Building

Building Metadata Business Intelligence Data Lake

Key Data Science, Machine Learning, AI and Analytics Developments of 2022

KDnuggets

DECEMBER 29, 2022

It's the end of the year, and so it's time for KDnuggets to assemble a team of experts and get to the bottom of what the most important data science, machine learning, AI and analytics developments of 2022 were.

Machine Learning

Machine Learning Data Science Data IT

The Terms and Conditions of a Data Contract are Data Tests

DataKitchen

DECEMBER 29, 2022

The Terms and Conditions of a Data Contract are Automated Production Data Tests. A data contract is a formal agreement between two parties that defines the structure and format of data that will be exchanged between them. Data contracts are a new idea for data and analytic team development to ensure that data is transmitted accurately and consistently between different systems or teams.

Data Validation

Data Validation Data Data Integration SQL

Our Top 5 Data Mesh Articles In 2022

Monte Carlo

DECEMBER 28, 2022

Data mesh is a complex socio-technological data engineering concept, but it doesn’t change too much. The four principles are still the four principles, there are still three experience planes, and automation is still as vital as ever. This is a good thing! Data mesh is one of those rare transformative concepts that emerged relatively fully formed as a result of creator Zhamak Dhegani’s years of consulting experience captured in a comprehensive 384 page book.

Retail

Retail Consulting Architecture Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

How to Execute Linux Commands in Python?

Workfall

DECEMBER 27, 2022

Reading Time: 8 minutes As of this writing, Linux has a global desktop market share of 2.77% ( A Report by Statcounter ), but it powers over 90% of all cloud infrastructure and hosting services. It is critical to be familiar with common Linux commands for this reason alone. According to a 2022 StackOverflow survey , Linux-based operating systems are more popular than macOS, demonstrating the appeal of using open-source software by professional developers, with an impressive 39.89% market share.

Python

Python Coding Programming Language Cloud

Data-Driven Holiday Cheer: How Santa is Using Analytics to Make the Season Bright

KDnuggets

DECEMBER 25, 2022

Want to know how Santa might use data science to make his job easier? So did we, so we asked ChatGPT. Read on to find out what it said.

Data Science

Data Science Data IT

Holiday Downtime, Without Data Downtime

The Modern Data Company

DECEMBER 27, 2022

Data center downtime can be costly. Gartner estimates that downtime can cost $5,600 per minute, extrapolating to well over $300K per hour. When your organization’s digital service is interrupted, it can impact employee productivity, company reputation, and customer loyalty. It can also result in the loss of business, data, and revenue. With the heart of the holiday season happening, we have tips on how to enjoy holiday downtime while avoiding the high costs of data center downtime.

Data Architecture

Data Architecture Architecture Data Government

How Data Products Are Changing Market Economics

Acceldata

DECEMBER 27, 2022

From a build perspective, data products ultimately translate into products that utilize data to improve services and overall functionality. And if we were to go by this definition, it becomes clear that no product in the world can truly survive unless they are a “data product”.

Utilities

Utilities Data Building IT

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

How to Solve 4 Elasticsearch Performance Challenges at Scale

Rockset

DECEMBER 27, 2022

Scaling Elasticsearch Elasticsearch is a NoSQL search and analytics engine that is easy to get started using for log analytics, text search, real-time analytics and more. That said, under the hood Elasticsearch is a complex, distributed system with many levers to pull to achieve optimal performance. In this blog, we walk through solutions to common Elasticsearch performance challenges at scale including slow indexing, search speed, shard and index sizing, and multi-tenancy.

Data Ingestion

Data Ingestion NoSQL Datasets Utilities

The Zen of Python

KDnuggets

DECEMBER 29, 2022

Python is one of the programming languages that are very versatile and relatively easy to learn. Hence it is the choice of many new programmers, regardless of what area of tech they are interested in. It is particularly popular in all data science branches.

Python

Python Programming Language Data Science Programming

Our Top 5 Most Popular Data Engineering Articles In 2022

Monte Carlo

DECEMBER 27, 2022

The Pareto Principle , which holds 80% of the results will derive from 20% of the cases, is tough to escape. It definitely holds true for our Data Downtime blog with these five articles driving a majority of our traffic in 2022. There are a few characteristics that separate these articles from the chaff, namely: They were among the first to describe or even define a nascent concept.

Data Engineer

Data Engineer Data Engineering Engineering Data Warehouse

Snowflake: SSE File Encryption using AWS KMS

Cloudyard

DECEMBER 27, 2022

Read Time: 3 Minute, 2 Second SSE File Encryption: During this post we will discuss an ERROR while executing the COPY command. Recently we got an issue while loading data from S3 bucket to Snowflake. According to the scenario, there were two files present in the bucket but surprisingly COPY command was failing to process one File. The command was reporting Access denied error for particular file.

AWS

AWS Accessible Accessibility Process

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Best of 2022: Round Up

Precisely

DECEMBER 27, 2022

As 2022 wraps up, we would like to recap our top posts of the year in Data Integrity, Data Integration, Data Quality, Data Governance, Location Intelligence, SAP Automation, and how data affects specific industries. Let’s take a look! Best of Data Integrity Data integrity empowers your businesses to make fast, confident decisions based on trusted data that has maximum accuracy, consistency, and context.

Insurance

Insurance Telecommunication Data Governance Government

A Guide to Train an Image Classification Model Using Tensorflow

KDnuggets

DECEMBER 28, 2022

Classify images at scale and with very high accuracy with the advent of machine learning and deep learning algorithms.

Deep Learning

Deep Learning Algorithm Machine Learning

Barr Moses: My Top 5 Articles of 2022

Monte Carlo

DECEMBER 27, 2022

I don’t see myself as a writer or blogger. In fact, the first blog post I published on Medium sat as a draft for months. ( Data downtime , anyone?) Prior to launching Monte Carlo, I interviewed hundreds of data leaders. I gained so much insight into their hopes, dreams, and fears that the impulse to share finally exceeded the anxiety of publishing. And there was no turning back.

Data Engineer

Data Engineer Data Engineering Cloud Machine Learning

Data Engineering Weekly in Year 2022

Data Engineering Weekly

DECEMBER 25, 2022

The holidays bring joy and memories. It is always a joyful memory for me every week when I pen down (or key down 🤷🏽‍♂️) every edition of Data Engineering Weekly. I want to take a holiday break for this week's edition, and instead, I want to reflect on our journey in 2022. A Growth To Remember 2022 has been a remarkable year in terms of subscriber growth.

Data Engineering

Data Engineering Data Engineer Engineering Data

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Dec 24, 2022 - Fri.Dec 30, 2022

Data Science Minimum: 10 Essential Skills You Need to Know to Start Doing Data Science

Should We Get Rid Of ETLs?

Webinars

Trending Sources

I asked ChatGPT to write a blog post about Data Engineering. Here it is.

Webinars

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

A Guide to Debugging Apache Airflow® DAGs

More Data Science Cheatsheets

Data Catalog - A Broken Promise

What is Apache Arrow? Asking for a friend.

Sign up to get articles personalized to your interests!

More Trending

What is Apache Arrow? Asking for a friend.

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Top 38 Python Libraries for Data Science, Data Visualization & Machine Learning

Looking to the Future – How a Data Operating System Breathes Life Into Healthcare

The Top Data Strategy Influencers and Content Creators on LinkedIn

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

5 Tasks To Automate With Python

Building a Future in Banking and Capital Markets

Top 5 Data Engineering Deep Dives in 2022

Agent Tooling: Connecting AI to Your Tools, Systems & Data

An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

Key Data Science, Machine Learning, AI and Analytics Developments of 2022

The Terms and Conditions of a Data Contract are Data Tests

Our Top 5 Data Mesh Articles In 2022

How to Modernize Manufacturing Without Losing Control

How to Execute Linux Commands in Python?

Data-Driven Holiday Cheer: How Santa is Using Analytics to Make the Season Bright

Holiday Downtime, Without Data Downtime

How Data Products Are Changing Market Economics

The Ultimate Guide to Apache Airflow DAGS

How to Solve 4 Elasticsearch Performance Challenges at Scale

The Zen of Python

Our Top 5 Most Popular Data Engineering Articles In 2022

Snowflake: SSE File Encryption using AWS KMS

Apache Airflow® Best Practices: DAG Writing

Best of 2022: Round Up

A Guide to Train an Image Classification Model Using Tensorflow

Barr Moses: My Top 5 Articles of 2022

Data Engineering Weekly in Year 2022

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected