Top Data Engineering Digest Coding Skills Certification Content for December, 2023

December, 2023

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

Towards Data Science

DECEMBER 15, 2023

For a data scientist, there’s no such thing as too much data. But when we take a broader look at the organizational context, we have to balance our goals with other considerations. Photo by Trnava University on Unsplash Data Science vs Security/IT: A Battle for the Ages Acquiring and keeping data is the focus of a huge amount of our mental energy as data scientists.

Machine Learning

Machine Learning Data Science Data Security Data Storage

25 Free Courses to Master Data Science, Data Engineering, Machine Learning, MLOps, and Generative AI

KDnuggets

DECEMBER 27, 2023

Discover a collection of top courses to launch your dream career or master a new skill, all for free!

Machine Learning

Machine Learning Data Science Data Engineering Data Engineer

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Streaming in Data Engineering

Towards Data Science

DECEMBER 11, 2023

Streaming data pipelines and real-time analytics Continue reading on Towards Data Science »

Data Engineer

Data Engineer Data Engineering Engineering Data Pipeline

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

A Tech Conference Listed Fake Speakers for Years: I Accidentally Noticed

The Pragmatic Engineer

DECEMBER 2, 2023

For 3 years straight, the DevTernity conference listed non-existent Coinbase employees as featured speakers. When were they added and what could have the motivation been? Three featured speakers listed at DevTernity 2021, 2022 and 2023, and JDKon 2024. These people do not exist. A year ago, I spent months doing an investigative report on how UK events tech company Pollen had its staff work for free, as it had run out of money but still kept operating.

Software Engineer

Software Engineer Software Engineering Insurance Coding

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: : Troubleshooting in Production" In this episode he highlights the sources of complexity that contribute to Kafka's operational difficulties, and some of the main ways to identify and mitigate

Kafka

Kafka Data Lake High Quality Data SQL

Unlocking the Power of Containers: Exploring the Top 20 Docker Containers for Every Development Need

Analytics Vidhya

DECEMBER 18, 2023

Introduction Docker containers have emerged as indispensable tools in the fast-evolving landscape of software development and deployment, providing a lightweight and efficient way to package, distribute, and run applications. This article delves into the top 20 Docker containers across various categories, showcasing their features, use cases, and contributions to streamlining development workflows.

SparkSQL is Destroying your Pipelines

Confessions of a Data Guy

DECEMBER 24, 2023

It’s true, even if you don’t want it to be. SparkSQL is destroying your data pipelines and possibly wreaking havoc on your entire data team, infrastructure, and life. In your heart of hearts, you’ve probably known it for years. With great power comes great responsibility. We all know that even us Data Engineers are human […] The post SparkSQL is Destroying your Pipelines appeared first on Confessions of a Data Guy.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

More Trending

SparkSQL is Destroying your Pipelines

Confessions of a Data Guy

DECEMBER 24, 2023

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

10 GitHub Repositories to Master Machine Learning

KDnuggets

DECEMBER 1, 2023

The blog covers machine learning courses, bootcamps, books, tools, interview questions, cheat sheets, MLOps platforms, and more to master ML and secure your dream job.

Machine Learning

Databricks Named a Leader in 2023 Gartner® Magic Quadrant™ for Cloud Database Management Systems

databricks

DECEMBER 21, 2023

We are excited to announce that Gartner has recognized Databricks as a Leader for a third consecutive year in the 2023 Gartner® Magic.

Database

Database Systems Cloud Management

Mentoring software engineers or engineering leaders

The Pragmatic Engineer

DECEMBER 18, 2023

I get asked every now and then if I offer 1:1 mentoring for either software engineers or engineering managers or leaders. While I used to do this in the past, I don't offer this any more. I collected much of the advice I have to offer for software engineers in The Software Engineer's Guidebook. I also write The Pragmatic Engineer Newsletter where I do cover topics like what it means to be a senior engineer at various companies , how to deal with a low-quality engineering culture , and

Software Engineering

Software Engineering Software Engineer Engineering Management

Building end-to-end security for Messenger

Engineering at Meta

DECEMBER 6, 2023

We are beginning to upgrade people’s personal conversations on Messenger to use end-to-end encryption (E2EE) by default Meta is publishing two technical white papers on end-to-end encryption: Our Messenger end-to-end encryption whitepaper describes the core cryptographic protocol for transmitting messages between clients. The Labyrinth encrypted storage protocol whitepaper explains our protocol for end-to-end encrypting stored messaging history between devices on a user’s account.

Building

Building Designing Consulting Accessible

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Make this 3D printed globe please

ArcGIS

DECEMBER 4, 2023

It's that time of year to warm ourselves beside the electric hum of a plastic filament printer and fall into the joy of making.

IT Education

Unlock the New Wave of Gen AI With Snowpark Container Services GPU-Powered Compute

Snowflake

DECEMBER 20, 2023

The rise of generative AI (gen AI) is inspiring organizations to envision a future in which AI is integrated into all aspects of their operations for a more human, personalized and efficient customer experience. However, getting the required compute infrastructure into place, particularly GPUs for large language models (LLMs), is a real challenge. Accessing the necessary resources from cloud providers demands careful planning and up to month-long wait times due to the high demand for GPUs.

Scala

Scala Government Java Cloud

Building Predictive Models: Logistic Regression in Python

KDnuggets

DECEMBER 1, 2023

Image by Author When you are getting started with machine learning, logistic regression is one of the first algorithms you’ll add to your toolbox.

Python

Python Building Machine Learning Algorithm

Creating High Quality RAG Applications with Databricks

databricks

DECEMBER 6, 2023

Retrieval-Augmented-Generation (RAG) has quickly emerged as a powerful way to incorporate proprietary, real-time data into Large Language Model (LLM) applications. Today we are.

Data

Data Data Science Engineering

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

The Pragmatic Engineer Newsletter in 2023

The Pragmatic Engineer

DECEMBER 21, 2023

2023 was the second full year of The Pragmatic Engineer Newsletter , and this newsletter is now almost two and a half years old; the first issue came out on 26 August 2021. Thank you for being a reader, I greatly value your support. This year, 102 newsletter issues were published, and this is number 103. You received a deepdive issue on Tuesdays, and every Thursday it was “The Pulse” – formerly The Scoop.

Engineering

Engineering Software Engineer Software Engineering Project

How Meta built the infrastructure for Threads

Engineering at Meta

DECEMBER 19, 2023

On July 5, 2023, Meta launched Threads, the newest product in our family of apps, to an unprecedented success that saw it garner over 100 million sign ups in its first five days. A small, nimble team of engineers built Threads over the course of only five months of technical work. While the app’s production launch had been under consideration for some time, the business finally made the decision and informed the infrastructure teams to prepare for its launch with only two days’ advance notice.

Programming Language

Programming Language Engineering Database Coding

Join Enhancements in ArcGIS Pro 3.2

ArcGIS

DECEMBER 4, 2023

ArcGIS Pro 3.2 includes a number of enhancements to the Spatial Join, Add Spatial Join, Add Join, and Join Field tools.

Snowflake Announces Agreement to Acquire Samooha to Simplify Building Interoperable Data Clean Rooms in the Data Cloud

Snowflake

DECEMBER 18, 2023

When businesses share sensitive first-party data with outside partners or customers, they must do so in a way that meets strict governance requirements around security and privacy. Data clean rooms have emerged as the technology to meet this need, enabling interoperability where multiple parties can collaborate on and analyze sensitive data in a governed way without exposing direct access to the underlying data and business logic.

Cloud

Cloud Building Entertainment Government

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

25 Free Books to Master SQL, Python, Data Science, Machine Learning, and Natural Language Processing

KDnuggets

DECEMBER 28, 2023

Discover a collection of best books to start your data career or master a new skill, all for free!

Machine Learning

Machine Learning Data Science Process SQL

Improve your RAG application response quality with real-time structured data

databricks

DECEMBER 8, 2023

Retrieval Augmented Generation (RAG) is an efficient mechanism to provide relevant data as context in Gen AI applications. Most RAG applications typically use.

Structured Data

Structured Data Data Data Science

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Designing

Designing Data Lake High Quality Data SQL

Streamhouse, the next house to move into?

Waitingforcode

DECEMBER 26, 2023

I must admit it, if you want to catch my attention, you can use some keywords. One of them is "stream". Knowing that, the topic of my new blog post shouldn't surprise you.

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Make this AI-inspired topo landscape please

ArcGIS

DECEMBER 19, 2023

Here's how to fake an isometric 3D topo terrain in 2D! And stuff.

Uplevel your dbt workflow with these tools and techniques

Start Data Engineering

DECEMBER 13, 2023

1. Introduction 2. Setup 3. Ways to uplevel your dbt workflow 3.1. Reproducible environment 3.1.1. A virtual environment with Poetry 3.1.2. Use Docker to run your warehouse locally 3.2. Reduce feedback loop time when developing locally 3.2.1. Run only required dbt objects with selectors 3.2.2. Use prod datasets to build dev models with defer 3.2.3. Parallelize model building by increasing thread count 3.

Datasets

Datasets Building

The KDnuggets 2023 Cheat Sheet Collection

KDnuggets

DECEMBER 25, 2023

KDnuggets has brought together all of its in-house cheat sheets from 2023 in this single, convenient location. Have a look to make sure you didn't miss out on anything over the year.

IT Data Science Data

Even Santa Claus has AI fever

databricks

DECEMBER 14, 2023

As CEO of the North Pole, Santa Claus oversees one of the world’s most complicated supply chain, manufacturing and logistics operations. Every year, S.

Manufacturing

Manufacturing Data

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Summary The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the s

Data Lake

Data Lake High Quality Data SQL Architecture

Order is king for the performance

Waitingforcode

DECEMBER 19, 2023

Even though nowadays data processing frameworks and data stores have smart query planners, they don't take our responsibility to correctly design the job logic.

Designing

Designing Data Process Process Data

My Vim-Verse: The Backbone of My Workflow

Simon Späti

DECEMBER 10, 2023

In my journey, detailed in why Vim is more than an editor , I’ve discovered the profound impact of integrating Vim and its motions into my entire computer workflow. This evolution, from using familiar tools like Notepad++ and SQL Server Management Studio to embracing Vim, represents a significant shift in how I approach tasks in data engineering and writing.

SQL

SQL Data Engineering Data Engineer Engineering

Making Flink Serverless, With Queries for Less Than a Penny

Confluent

DECEMBER 12, 2023

Dive into the serverless architecture of Confluent Cloud for Apache Flink and explore its benefits like reduced infrastructure costs, increased reliability, & seamless adoption.

Cloud

Cloud Architecture IT

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

December, 2023

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

25 Free Courses to Master Data Science, Data Engineering, Machine Learning, MLOps, and Generative AI

Webinars

Trending Sources

Streaming in Data Engineering

Webinars

A Tech Conference Listed Fake Speakers for Years: I Accidentally Noticed

A Guide to Debugging Apache Airflow® DAGs

Troubleshooting Kafka In Production

Unlocking the Power of Containers: Exploring the Top 20 Docker Containers for Every Development Need

SparkSQL is Destroying your Pipelines

Sign up to get articles personalized to your interests!

More Trending

SparkSQL is Destroying your Pipelines

10 GitHub Repositories to Master Machine Learning

Databricks Named a Leader in 2023 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Mentoring software engineers or engineering leaders

Building end-to-end security for Messenger

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Make this 3D printed globe please

Unlock the New Wave of Gen AI With Snowpark Container Services GPU-Powered Compute

Building Predictive Models: Logistic Regression in Python

Creating High Quality RAG Applications with Databricks

Agent Tooling: Connecting AI to Your Tools, Systems & Data

The Pragmatic Engineer Newsletter in 2023

How Meta built the infrastructure for Threads

Join Enhancements in ArcGIS Pro 3.2

Snowflake Announces Agreement to Acquire Samooha to Simplify Building Interoperable Data Clean Rooms in the Data Cloud

How to Modernize Manufacturing Without Losing Control

25 Free Books to Master SQL, Python, Data Science, Machine Learning, and Natural Language Processing

Improve your RAG application response quality with real-time structured data

Designing Data Platforms For Fintech Companies

Streamhouse, the next house to move into?

Optimizing The Modern Developer Experience with Coder

Make this AI-inspired topo landscape please

Uplevel your dbt workflow with these tools and techniques

The KDnuggets 2023 Cheat Sheet Collection

Even Santa Claus has AI fever

15 Modern Use Cases for Enterprise Business Intelligence

Adding An Easy Mode For The Modern Data Stack With 5X

Order is king for the performance

My Vim-Verse: The Backbone of My Workflow

Making Flink Serverless, With Queries for Less Than a Penny

The Ultimate Guide to Apache Airflow DAGS

Stay Connected