Sat.Jan 07, 2023 - Fri.Jan 13, 2023

article thumbnail

Inside Pollen's Software Engineering Salaries

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one and a half out of eight topics in today’s subscriber-only issue, Inside Pollen's Transparent Compensation Data. If you’re not yet a subscriber, you also missed this week’s deep-dive on Becoming a Fractional CTO. To get this newsletter every week, subscribe here.

article thumbnail

Simplify Delta Lake Complexity with mack.

Confessions of a Data Guy

Anyone who’s been roaming around the forest of Data Engineering has probably run into many of the newish tools that have been growing rapidly around the concepts of Data Warehouses, Data Lakes, and Lake Houses … the merging of the old relational database functionality with TB and PB level cloud-based file storage systems. Tools like […] The post Simplify Delta Lake Complexity with mack. appeared first on Confessions of a Data Guy.

Data Lake 162
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Where Collaboration Fails Around Data (And 4 Tips for Fixing It)

KDnuggets

Data-driven organizations require complex collaboration between data teams and business stakeholders. Here are 4 proactive tips for reducing information asymmetries and achieving better collaboration.

IT 160
article thumbnail

Data Pipeline Design Patterns - #2. Coding patterns in Python

Start Data Engineering

Introduction Sample project Code design patterns 1. Functional design 2. Factory pattern 3. Strategy pattern 4. Singleton, & Object pool patterns Python helpers 1. Typing 2. Dataclass 3. Context Managers 4. Testing with pytest 5. Decorators Misc Conclusion Further reading References Introduction Using the appropriate code design pattern can make your code easy to read, extensible, and seamless to modify existing logic, debug, and enable developers to onboard quicker.

Designing 147
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Analysis of Confluent Buying Immerok

Jesse Anderson

If you haven’t heard, Confluent announced they’re buying Immerok. This purchase represents a significant shift in strategy for Confluent. I started a Twitter thread with some of my initial thoughts, but I want to write a post giving more analysis and opinions. In short, I still echo the sentiment from my original tweet “This was always the way it should have been.

Kafka 147
article thumbnail

Using Rust to write a Data Pipeline. Thoughts. Musings.

Confessions of a Data Guy

Rust has been on my mind a lot lately, probably because of Data Engineering boredom, watching Spark clusters chug along like some medieval farm worker endlessly trudging through the muck and mire of life. Maybe Rust has breathed some life back into my stagnant soul, reminding me there is a big world out there, […] The post Using Rust to write a Data Pipeline.

More Trending

article thumbnail

Data Warehouse Consultants – What Do They Do And Why You Need One

Seattle Data Guy

A data warehouse consultant plays an important role in companies looking to become data-driven. They help companies design and deploy centralized data sets that are easy to use and reliable. But in order to understand why you need a data warehouse consultant we should take a step back. In this article we will not only… Read more The post Data Warehouse Consultants – What Do They Do And Why You Need One appeared first on Seattle Data Guy.

article thumbnail

Modern Data Stack: The Struggle of Enterprise Adoption

Simon Späti

In part I, The Open Data Stack Distilled into Four Core Tools, we discussed how to quickly set up a data stack, tackling end-to-end data analytics challenges. As a manager or developer working with data at a mid- to large-sized enterprise, you might ask why aren’t we using any of these tools. In this article, we dive into what mid-to-large-sized companies are using instead, the struggle of setting up a Modern Data Stack (MDS) for an enterprise size, and the opportunities of a free-of-charge and

article thumbnail

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

Data Engineering Podcast

Summary Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake.

article thumbnail

7 Best Platforms to Practice SQL

KDnuggets

Looking to level up your SQL skills? Here's a list of the best platforms to practice SQL, ace your SQL interviews, and land your dream data role.

SQL 149
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Improving Your Data Analytics Infrastructure In 2023 – Part 1

Seattle Data Guy

Data has been consistently demonstrated to be a valuable asset for businesses of all sizes. Consulting firms, like McKinsey, have found that companies using AI and analytics attribute 20% of their earnings to it. As a consultant, I have personally witnessed how data can uncover new sources of revenue and cost reduction opportunities for clients… Read more The post Improving Your Data Analytics Infrastructure In 2023 – Part 1 appeared first on Seattle Data Guy.

article thumbnail

Modern Data Stack: The Struggle of Enterprise Adoption

Simon Späti

In part I, The Open Data Stack Distilled into Four Core Tools, we discussed how to quickly set up a data stack, tackling end-to-end data analytics challenges. As a manager or developer working with data at a mid- to large-sized enterprise, you might ask why aren’t we using any of these tools. In this article, we dive into what mid-to-large-sized companies are using instead, the struggle of setting up a Modern Data Stack (MDS) for an enterprise size, and the opportunities of a free-of-charge and

article thumbnail

Data News — Week 23.01

Christophe Blefari

You and me celebrating 2023 ( credits ) Happy new year 🎆 For those who were already subscribed at the start of last year I tried to put resolutions and objectives for the year that I did not succeed to follow. The year was so different to what I was expected. Maybe this is an excuse. Anyway I did not reach my goals. What about if we don't care for this year?

article thumbnail

Overcome Your Data Quality Issues with Great Expectations

KDnuggets

Bad data costs organizations money, reputation, and time. Hence it is very important to monitor and validate data quality continuously.

Data 136
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Succeeding with Change Data Capture

Confluent

CDC is a software design pattern that identifies and captures changes made to data in a database. Learn how CDC works, the best solutions, and how to get started with various implementations.

Database 124
article thumbnail

The Impact of Data and AI on a Modern Business

databricks

It is no secret that there has been an explosion of data in the past 10 years. As per Forbes, from 2010 to.

Data 104
article thumbnail

Product Discovery – Building the Right Things

Teradata

Product discovery is a process that cross functional product teams follow to reduce the uncertainty about a problem worth solving and a solution worth developing. Learn more.

article thumbnail

Google Data Analytics Certification Review for 2023

KDnuggets

What is the Google Data Analytics Certification? And, more importantly, is it still worth getting it in 2023?

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

In the spotlight with Nick Cooper: ThoughtSpot’s Selfless Excellence champion

ThoughtSpot

This is part of our ongoing spotlight series which highlights ThoughtSpot’s quarterly Selfless Excellence champion. Culture and shared values are at the heart of every decision, innovation, and team member at ThoughtSpot. By creating a family-first mentality among a truly diverse and inclusive team , we’ve been able to build more authentic relationships with one another.

article thumbnail

Databricks Power BI Connector Now Supports Native Query

databricks

This is a collaborative post from Databricks and Microsoft. We thank Mahesh Prakriya (Director in Intelligence Platform, Microsoft) and Bob Zhang (Sr. Technical.

BI 98
article thumbnail

Top Data Integrity Trends Fueling Confident Business Decisions in 2023

Precisely

With global data creation projected to grow to more than 180 zettabytes by 2025 , it’s not surprising that more organizations than ever are looking to harness their ever-growing datasets to drive more confident business decisions. In fact, a recent study from 451 Research shows that nearly 79% of businesses report data will be more important to their organization’s strategic-making over the next 12 months.

article thumbnail

Topic Modeling Approaches: Top2Vec vs BERTopic

KDnuggets

This post gives an overview of the strengths and differences of these approaches in extracting topics from text.

Process 132
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Saving Lives, Saving Costs: Predicting Heart Failure with Teradata

Teradata

A team of Teradata data scientists & industry experts worked alongside a U.S. insurance company to develop a solution that would predict the onset of heart failure 6 months in advance. Find out more.

article thumbnail

Supercharging H3 for Geospatial Analytics

databricks

On the heels of the initial release of H3 support in Databricks Runtime (DBR), we are happy to share ground-breaking performance improvements with.

98
article thumbnail

5 Challenges of Ethical Data Stewardship

Precisely

The pressure is mounting. Data privacy regulations are constantly evolving, and customer preferences and expectations are high and on the move. That means businesses want to provide hyper-personalized experiences, but they also need to ensure they’re using, sharing, and protecting customer data with the utmost integrity. And with the rising focus on environmental, social, and governance (ESG), businesses can no longer rely on quality products alone to win and maintain the support of customers, e

article thumbnail

Performing a T-Test in Python

KDnuggets

An introduction to the t-test with python implementation.

Python 132
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How to build a Snowflake API | Propel Data Analytics Blog

Propel Data

Create and query an API on top of your Snowflake data warehouse using Propel’s blazing-fast Serverless Analytics API Platform

article thumbnail

Better Data for Better Decisions in the Public Sector Through Entity Resolution - Part 1

databricks

One of the domains where better decisions mean a better society is the Public Sector. Each and every one of us has a.

Data 98
article thumbnail

How DoorDash Upgraded a Heuristic with ML to Save Thousands of Canceled Orders

DoorDash Engineering

One challenge in running our platform is being able to accurately track Merchants’ operational status and ability to receive and fulfill orders. For example, when a Merchant’s location is physically closed but marked as open on our platform, we might create a bad experience for all of our users; a Dasher cannot complete their accepted delivery, the Consumer cannot receive their ordered food, and the Merchant could see lower future revenues.

Food 62
article thumbnail

Approaches to Data Imputation

KDnuggets

This guide will discuss what data imputation is as well as the types of approaches it supports.

Data 129
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m