Sat.Jan 07, 2023 - Fri.Jan 13, 2023

article thumbnail

Inside Pollen's Software Engineering Salaries

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one and a half out of eight topics in today’s subscriber-only issue, Inside Pollen's Transparent Compensation Data. If you’re not yet a subscriber, you also missed this week’s deep-dive on Becoming a Fractional CTO. To get this newsletter every week, subscribe here.

article thumbnail

Simplify Delta Lake Complexity with mack.

Confessions of a Data Guy

Anyone who’s been roaming around the forest of Data Engineering has probably run into many of the newish tools that have been growing rapidly around the concepts of Data Warehouses, Data Lakes, and Lake Houses … the merging of the old relational database functionality with TB and PB level cloud-based file storage systems. Tools like […] The post Simplify Delta Lake Complexity with mack. appeared first on Confessions of a Data Guy.

Data Lake 162
article thumbnail

Where Collaboration Fails Around Data (And 4 Tips for Fixing It)

KDnuggets

Data-driven organizations require complex collaboration between data teams and business stakeholders. Here are 4 proactive tips for reducing information asymmetries and achieving better collaboration.

IT 160
article thumbnail

Data Pipeline Design Patterns - #2. Coding patterns in Python

Start Data Engineering

Introduction Sample project Code design patterns 1. Functional design 2. Factory pattern 3. Strategy pattern 4. Singleton, & Object pool patterns Python helpers 1. Typing 2. Dataclass 3. Context Managers 4. Testing with pytest 5. Decorators Misc Conclusion Further reading References Introduction Using the appropriate code design pattern can make your code easy to read, extensible, and seamless to modify existing logic, debug, and enable developers to onboard quicker.

Designing 147
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Analysis of Confluent Buying Immerok

Jesse Anderson

If you haven’t heard, Confluent announced they’re buying Immerok. This purchase represents a significant shift in strategy for Confluent. I started a Twitter thread with some of my initial thoughts, but I want to write a post giving more analysis and opinions. In short, I still echo the sentiment from my original tweet “This was always the way it should have been.

Kafka 147
article thumbnail

Using Rust to write a Data Pipeline. Thoughts. Musings.

Confessions of a Data Guy

Rust has been on my mind a lot lately, probably because of Data Engineering boredom, watching Spark clusters chug along like some medieval farm worker endlessly trudging through the muck and mire of life. Maybe Rust has breathed some life back into my stagnant soul, reminding me there is a big world out there, […] The post Using Rust to write a Data Pipeline.

More Trending

article thumbnail

Data Warehouse Consultants – What Do They Do And Why You Need One

Seattle Data Guy

A data warehouse consultant plays an important role in companies looking to become data-driven. They help companies design and deploy centralized data sets that are easy to use and reliable. But in order to understand why you need a data warehouse consultant we should take a step back. In this article we will not only… Read more The post Data Warehouse Consultants – What Do They Do And Why You Need One appeared first on Seattle Data Guy.

article thumbnail

Modern Data Stack: The Struggle of Enterprise Adoption

Simon Späti

In part I, The Open Data Stack Distilled into Four Core Tools, we discussed how to quickly set up a data stack, tackling end-to-end data analytics challenges. As a manager or developer working with data at a mid- to large-sized enterprise, you might ask why aren’t we using any of these tools. In this article, we dive into what mid-to-large-sized companies are using instead, the struggle of setting up a Modern Data Stack (MDS) for an enterprise size, and the opportunities of a free-of-charge and

article thumbnail

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

Data Engineering Podcast

Summary Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake.

article thumbnail

7 Best Platforms to Practice SQL

KDnuggets

Looking to level up your SQL skills? Here's a list of the best platforms to practice SQL, ace your SQL interviews, and land your dream data role.

SQL 151
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Improving Your Data Analytics Infrastructure In 2023 – Part 1

Seattle Data Guy

Data has been consistently demonstrated to be a valuable asset for businesses of all sizes. Consulting firms, like McKinsey, have found that companies using AI and analytics attribute 20% of their earnings to it. As a consultant, I have personally witnessed how data can uncover new sources of revenue and cost reduction opportunities for clients… Read more The post Improving Your Data Analytics Infrastructure In 2023 – Part 1 appeared first on Seattle Data Guy.

article thumbnail

Modern Data Stack: The Struggle of Enterprise Adoption

Simon Späti

In part I, The Open Data Stack Distilled into Four Core Tools, we discussed how to quickly set up a data stack, tackling end-to-end data analytics challenges. As a manager or developer working with data at a mid- to large-sized enterprise, you might ask why aren’t we using any of these tools. In this article, we dive into what mid-to-large-sized companies are using instead, the struggle of setting up a Modern Data Stack (MDS) for an enterprise size, and the opportunities of a free-of-charge and

article thumbnail

Data News — Week 23.01

Christophe Blefari

You and me celebrating 2023 ( credits ) Happy new year 🎆 For those who were already subscribed at the start of last year I tried to put resolutions and objectives for the year that I did not succeed to follow. The year was so different to what I was expected. Maybe this is an excuse. Anyway I did not reach my goals. What about if we don't care for this year?

article thumbnail

Overcome Your Data Quality Issues with Great Expectations

KDnuggets

Bad data costs organizations money, reputation, and time. Hence it is very important to monitor and validate data quality continuously.

Data 143
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Succeeding with Change Data Capture

Confluent

CDC is a software design pattern that identifies and captures changes made to data in a database. Learn how CDC works, the best solutions, and how to get started with various implementations.

Data 124
article thumbnail

The Impact of Data and AI on a Modern Business

databricks

It is no secret that there has been an explosion of data in the past 10 years. As per Forbes, from 2010 to.

Data 104
article thumbnail

Product Discovery – Building the Right Things

Teradata

Product discovery is a process that cross functional product teams follow to reduce the uncertainty about a problem worth solving and a solution worth developing. Learn more.

article thumbnail

Google Data Analytics Certification Review for 2023

KDnuggets

What is the Google Data Analytics Certification? And, more importantly, is it still worth getting it in 2023?

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

In the spotlight with Nick Cooper: ThoughtSpot’s Selfless Excellence champion

ThoughtSpot

This is part of our ongoing spotlight series which highlights ThoughtSpot’s quarterly Selfless Excellence champion. Culture and shared values are at the heart of every decision, innovation, and team member at ThoughtSpot. By creating a family-first mentality among a truly diverse and inclusive team , we’ve been able to build more authentic relationships with one another.

article thumbnail

Databricks Power BI Connector Now Supports Native Query

databricks

This is a collaborative post from Databricks and Microsoft. We thank Mahesh Prakriya (Director in Intelligence Platform, Microsoft) and Bob Zhang (Sr. Technical.

BI 98
article thumbnail

Top Data Integrity Trends Fueling Confident Business Decisions in 2023

Precisely

With global data creation projected to grow to more than 180 zettabytes by 2025 , it’s not surprising that more organizations than ever are looking to harness their ever-growing datasets to drive more confident business decisions. In fact, a recent study from 451 Research shows that nearly 79% of businesses report data will be more important to their organization’s strategic-making over the next 12 months.

article thumbnail

Topic Modeling Approaches: Top2Vec vs BERTopic

KDnuggets

This post gives an overview of the strengths and differences of these approaches in extracting topics from text.

Process 140
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Saving Lives, Saving Costs: Predicting Heart Failure with Teradata

Teradata

A team of Teradata data scientists & industry experts worked alongside a U.S. insurance company to develop a solution that would predict the onset of heart failure 6 months in advance. Find out more.

article thumbnail

Supercharging H3 for Geospatial Analytics

databricks

On the heels of the initial release of H3 support in Databricks Runtime (DBR), we are happy to share ground-breaking performance improvements with.

98
article thumbnail

5 Challenges of Ethical Data Stewardship

Precisely

The pressure is mounting. Data privacy regulations are constantly evolving, and customer preferences and expectations are high and on the move. That means businesses want to provide hyper-personalized experiences, but they also need to ensure they’re using, sharing, and protecting customer data with the utmost integrity. And with the rising focus on environmental, social, and governance (ESG), businesses can no longer rely on quality products alone to win and maintain the support of customers, e

article thumbnail

Performing a T-Test in Python

KDnuggets

An introduction to the t-test with python implementation.

Python 139
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

How to build a Snowflake API | Propel Data Analytics Blog

Propel Data

Create and query an API on top of your Snowflake data warehouse using Propel’s blazing-fast Serverless Analytics API Platform

article thumbnail

Better Data for Better Decisions in the Public Sector Through Entity Resolution - Part 1

databricks

One of the domains where better decisions mean a better society is the Public Sector. Each and every one of us has a.

Data 98
article thumbnail

How DoorDash Upgraded a Heuristic with ML to Save Thousands of Canceled Orders

DoorDash Engineering

One challenge in running our platform is being able to accurately track Merchants’ operational status and ability to receive and fulfill orders. For example, when a Merchant’s location is physically closed but marked as open on our platform, we might create a bad experience for all of our users; a Dasher cannot complete their accepted delivery, the Consumer cannot receive their ordered food, and the Merchant could see lower future revenues.

Food 62
article thumbnail

Approaches to Data Imputation

KDnuggets

This guide will discuss what data imputation is as well as the types of approaches it supports.

Data 137
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.