Top Data Engineering Digest Data Integration Computer Science Content for Week of Dec 31

Sat.Dec 31, 2022 - Fri.Jan 06, 2023

Why I'm using (Neo)vim as a Data Engineer and Writer in 2023

Simon Späti

JANUARY 3, 2023

I used VS Code, Sublime, Notepad++, TextMate, and others, but the shortcut with cmd(+shift)+end, jumping with option+arrow-keys from word to word, needed to be faster at some point. I was hitting my limits. Everything I was doing I did decently fast, but I didn’t get any faster. Vim is the only editor you get faster with time. Vim is based solely on shortcuts.

Data Engineering

Data Engineering Data Engineer Engineering Coding

Python Matplotlib Cheat Sheets

KDnuggets

JANUARY 3, 2023

Matplotlib is the most famous and commonly used plotting library in Python. It allows you to create clear and interactive visualizations that make your data easier to understand and your results more concrete.

Python

Python IT Data Data Science

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Confluent + Immerok: Cloud Native Kafka Meets Cloud Native Flink

Confluent

JANUARY 6, 2023

Introducing fully managed Apache Kafka® + Flink for the most robust, cloud-native data streaming platform with stream processing, integration, and streaming analytics in one.

Kafka

Kafka Cloud Process Management

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

CircleCI’s unnoticed holiday security breach

The Pragmatic Engineer

JANUARY 5, 2023

Originally published on 5 January 2023. 👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of seven topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe here. For most engineering teams, returning from the winter holiday usually involves gradually getting back into the swing of things.

Pipeline-centric

Pipeline-centric Database-centric Coding Accessible

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Why Vim Is More than Just an Editor – Vim Language, Motions, and Modes Explained

Simon Späti

JANUARY 3, 2023

Throughout my time as a developer, I’ve used VS Code, Sublime, Notepad++, TextMate, and others. But shortcuts like cmd(+shift)+end and jumping with option+arrow-keys from word to word needed to be faster at some point. I was hitting my limits. Everything I was doing I did decently fast, but I didn’t get any faster. I’ve since learned that Vim is the only editor that you get faster using with time.

Coding

A Solid Plan for Learning Data Science, Machine Learning, and Deep Learning

KDnuggets

JANUARY 4, 2023

Check out this solid plan for learning Data Science, Machine Learning, and Deep Learning. The entire plan is currently available at no cost to KDnuggets readers.

Deep Learning

Deep Learning Machine Learning Data Science Data

I talked to DataGen podcast

Christophe Blefari

JANUARY 4, 2023

🎙 A few week ago I did my first podcast with Robin. We talked about data engineering and everything around doing a weekly curation. This is the first episode of Robin's podcast in English and you should follow him because more are coming! In the podcast we talked about 🔥 My journey before launching the newsletter 🔥 Why and how I write 🔥 My main challenges as a Data Engineer 🔥 My favorite contents 🔥 What I like about data 🔥 A few tips f

Data Engineering

Data Engineering Data Engineer Engineering Data

More Trending

I talked to DataGen podcast

Christophe Blefari

JANUARY 4, 2023

Data Engineering

Data Engineering Data Engineer Engineering Data

4 Tips for Agility and Resiliency Through Supply Chain Process Automation

Precisely

JANUARY 3, 2023

Times are changing, and at a near-constant pace. With shifting customer preferences and disruptive world events shaking up the global supply chain market, many business leaders are left wondering whether they’ll be able to stay competitive. Supply chain automation technologies can have a big role to play when it comes to providing end-to-end visibility and risk mitigation for complex, data-intensive SAP processes in supply chain.

Process

Process Data Governance Algorithm Data Integration

The Open Data Stack Distilled into Four Core Tools

Simon Späti

JANUARY 3, 2023

In this article, we are going to explore core open-source tools that are needed for any company to become data-driven. We’ll cover integration, transformation, orchestration, analytics, and ML tools as a starter guide to the latest open data stack. Let’s start with the Modern Data Stack. Have you heard of it or where the term came from?

Data

Data IT

Free Data Management with Data Science Learning with CS639

KDnuggets

JANUARY 6, 2023

Learn Data Management with Data Science for FREE with CS639.

Data Science

Data Science Data Management Management Data

Meaningful Product Experimentation: 5 Impactful Data Projects for Building Better Products

Monte Carlo

JANUARY 6, 2023

Understanding and aligning with each business domain’s unique incentives and workflows is what ultimately makes data teams not just efficient, but great. Part one of this series looked at everyone’s favorite spreadsheet power users: the finance team. This article will examine how data teams can better conduct product experimentation and better align with product teams.

Project

Project Building BI Data

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Clustering on Normal, CLONE, COPY tables

Cloudyard

JANUARY 6, 2023

Read Time: 2 Minute, 32 Second During this post we will discuss multiple scenario on Clustering Tables. We will be analyzing and implementing the following scenarios in this post. Non Cluster to Cluster table : Create Clustering on Normal table and see the partitions pruning. CLONE Cluster table: CLONE the above Clustered table and analyze the Clustering.

IT Data

Building Geospatial Data Products

databricks

JANUARY 5, 2023

Geospatial data has been driving innovation for centuries, through use of maps, cartography and more recently through digital content. For example, the oldest.

Building

Building Data Data Science Engineering

Python Lambda Functions, Explained

KDnuggets

JANUARY 6, 2023

Learn the syntax and uses of the lambda function, which is an alternative to the regular Python function.

Python

What is ESG?

Precisely

JANUARY 6, 2023

Environmental, social, and governance (ESG) initiatives are topics of discussion everywhere – in the workplace, social media, news outlets, and beyond. And for good reason. Recent public advocacy efforts around climate issues, diversity and inclusion, data privacy, and more have been driving forces in pushing ESG to the forefront. While stellar products and services used to be enough for businesses to attract new customers, investors, and employees – and win their loyalty over time – that’s not

Government

Government Education Data Integration Media

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Teradata’s Top 10 Innovations in 2022

Teradata

JANUARY 6, 2023

As we start 2023, our product marketing team has compiled a list of the top 10 features in Teradata Vantage which have immensely helped our customers and are technological breakthroughs.

Technology

What is ChatGPT?

Elder Research

JANUARY 5, 2023

The post What is ChatGPT? appeared first on Elder Research.

Natural Language Processing with spaCy

KDnuggets

JANUARY 4, 2023

Learn to build NLP projects using spaCy.

Process

Process Project Building

How Corning Built End-to-end ML on Databricks Lakehouse Platform

databricks

JANUARY 5, 2023

“This blog is authored by Denis Kamotsky, Principal Software Engineer at Corning” Corning has been one of the world’s leading innovators in materials scien.

Software Engineer

Software Engineer Software Engineering Engineering

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Real-World Data Governance: The Role of Data Governance in a Data Strategy

Precisely

JANUARY 5, 2023

Does your company have a formal data strategy? If so, does that strategy effectively lay out a path toward better business outcomes by helping you optimize your use of data? Although it is a given that data can be one of a company’s real differentiators if used properly, many organizations still do not have a comprehensive data strategy in place.

Data Governance

Data Governance Government Data Raw Data

2023 Predictions: Data Trends That Will Dominate Business Agenda in APAC

Cloudera

JANUARY 5, 2023

In the past year, businesses who doubled down on digital transformation during the pandemic saw their efforts coming to fruition in the form of cost savings and more streamlined data management. Faced with even more pressure to remain resilient and agile amid looming global economic threats, Asia-Pacific (APAC) region businesses are looking to further mobilize emerging technologies such as artificial intelligence (AI) and machine learning that will optimize operational efficiencies and cost savi

Banking

Banking Machine Learning Insurance Data Architecture

How to Merge Pandas DataFrames

KDnuggets

JANUARY 5, 2023

Data merge is a common data processing activity. Learn how Pandas provide various ways to merge our data.

Data Process

Data Process Process Data Data Science

Products We Think You Might Like: Generating Personalized Recommendations Using Matrix Factorization

databricks

JANUARY 5, 2023

Check our Solution Accelerator for Matrix Factorization for more details and to download the notebooks. Recommenders are a critical part of the modern.

Retail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Now in Preview: Webhook Data Source | Propel Data Analytics Blog

Propel Data

JANUARY 4, 2023

Easily get your data into Propel to power a variety of customer-facing analytics use cases.

Data Analytics

Data Analytics Data

A Guide to Data Contracts

Striim

JANUARY 4, 2023

Companies need to analyze large volumes of datasets, leading to an increase in data producers and consumers within their IT infrastructures. These companies collect data from production applications and B2B SaaS tools (e.g., Mailchimp). This data makes its way into a data repository, like a data warehouse (e.g., Redshift), and is shown to users via a dashboard for decision-making.

PostgreSQL

PostgreSQL Data Warehouse Data Data Lake

SQL With CSVs

KDnuggets

JANUARY 5, 2023

Write SQL query to analyze CSV files using the simple command line tool.

SQL

How Collaborative Imaging Delivers Healthier Data Products with Monte Carlo

Monte Carlo

JANUARY 4, 2023

As a radiologist-owned alliance built by physicians, Collaborative Imaging knows a thing or two about what it means to be healthy. And the same goes for their data. From revenue cycle management to telehealth, Collaborative Imaging ’s physician-conceived platform is solving some of the biggest technology challenges facing modern medical practices. And with hundreds of hospitals utilizing Collaborative Imaging’s data products to optimize their practices, data quality is paramount for the data tea

Hospitality

Hospitality Healthcare Medical Insurance

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Selecting the Best Image for Each Merchant Using Exploration and Machine Learning

DoorDash Engineering

JANUARY 4, 2023

In order to inspire DoorDash consumers to order from the platform there are few tools more powerful than a compelling image, which raises the questions: what is the best image to show each customer, and how can we build a model to determine that programmatically using each merchant’s available images? Figure 1: Discovery surfaces with merchant images Out of all the different information presented on the home page (see Figure 1), studies with consumers have repeatedly shown that images play the m

Machine Learning

Machine Learning Food Algorithm Building

Recycling Kubernetes Nodes

Yelp Engineering

JANUARY 4, 2023

Manually managing the lifecycle of Kubernetes nodes can become difficult as the cluster scales. Especially if your clusters are multi-tenant and self-managed. You may need to replace nodes for various reasons, such as OS upgrades and security patches. One of the biggest challenges is how to terminate nodes without disturbing tenants. In this post, I’ll describe the problems we encountered administering Yelp’s clusters and the solutions we implemented.

Management

Management Building IT

Top Data Python Packages to Know in 2023

KDnuggets

JANUARY 4, 2023

These Python packages would improve your data workflow.

Python

Python Data Workflow Data Data Science

Achieve Your Goals With Databricks Certifications

databricks

JANUARY 4, 2023

Elevate your career in the New Year! Start the new year off right by taking your Databricks enablement to the next level by.

Certification

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Dec 31, 2022 - Fri.Jan 06, 2023

Why I'm using (Neo)vim as a Data Engineer and Writer in 2023

Python Matplotlib Cheat Sheets

Webinars

Trending Sources

Confluent + Immerok: Cloud Native Kafka Meets Cloud Native Flink

Webinars

CircleCI’s unnoticed holiday security breach

A Guide to Debugging Apache Airflow® DAGs

Why Vim Is More than Just an Editor – Vim Language, Motions, and Modes Explained

A Solid Plan for Learning Data Science, Machine Learning, and Deep Learning

I talked to DataGen podcast

Sign up to get articles personalized to your interests!

More Trending

I talked to DataGen podcast

4 Tips for Agility and Resiliency Through Supply Chain Process Automation

The Open Data Stack Distilled into Four Core Tools

Free Data Management with Data Science Learning with CS639

Meaningful Product Experimentation: 5 Impactful Data Projects for Building Better Products

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Clustering on Normal, CLONE, COPY tables

Building Geospatial Data Products

Python Lambda Functions, Explained

What is ESG?

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Teradata’s Top 10 Innovations in 2022

What is ChatGPT?

Natural Language Processing with spaCy

How Corning Built End-to-end ML on Databricks Lakehouse Platform

How to Modernize Manufacturing Without Losing Control

Real-World Data Governance: The Role of Data Governance in a Data Strategy

2023 Predictions: Data Trends That Will Dominate Business Agenda in APAC

How to Merge Pandas DataFrames

Products We Think You Might Like: Generating Personalized Recommendations Using Matrix Factorization

The Ultimate Guide to Apache Airflow DAGS

Now in Preview: Webhook Data Source | Propel Data Analytics Blog

A Guide to Data Contracts

SQL With CSVs

How Collaborative Imaging Delivers Healthier Data Products with Monte Carlo

Apache Airflow® Best Practices: DAG Writing

Selecting the Best Image for Each Merchant Using Exploration and Machine Learning

Recycling Kubernetes Nodes

Top Data Python Packages to Know in 2023

Achieve Your Goals With Databricks Certifications

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected