Sat.Jan 14, 2023 - Fri.Jan 20, 2023

article thumbnail

Replacing Pandas with Polars. A Practical Guide.

Confessions of a Data Guy

I remember those days, oh so long ago, it seems like another lifetime. I haven’t used Pandas in many a year, decades, or whatever. We’ve all been there, done that. Pandas I mean. I would dare say it’s a rite of passage for most data folk. For those using Python, it’s probably one of the […] The post Replacing Pandas with Polars.

Python 361
article thumbnail

How To Hire Junior Data Engineers

Seattle Data Guy

With all the recent data events I have put together I inevitably run into new data engineers who are either finishing up college or looking to transition into a data engineer or data scientist position. In fact I have talked to several newly graduated engineers who are struggling to find work. A few told me… Read more The post How To Hire Junior Data Engineers appeared first on Seattle Data Guy.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

ChatGPT as a Python Programming Assistant

KDnuggets

Is ChatGPT useful for Python programmers, specifically those of us who use Python for data processing, data cleaning, and building machine learning models? Let's give it a try and find out.

Python 160
article thumbnail

What Big Tech layoffs suggest for the industry

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get the full issues, twice a week: subscribe here. Update on 20 January: less than a day after publishing this article, Google announced historic layoffs that will impact ~12,000 positions.

Banking 155
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Data News — Week 23.03

Christophe Blefari

Summer in coming ( credits ) Hey, new Friday, new Data News edition. I'm so happy to see new people coming every week. Thank you for every recommendation you do about the blog or the Data News. This kindness for my content gives me wings. This week I don't want to be late, so let's start the weekly wrap-up. I got less inspired this week, it means shorter edition.

article thumbnail

What Is The State Of Data Engineering And Infrastructure In 2023

Seattle Data Guy

2022 is coming to an end. What is the state of data infra? Are Snowflake and Databricks still fighting over total cost of ownership? Is everyone switching to DuckDB? Are data engineers all learning Rust? Let’s try to answer these questions. Our team is putting together an all day event focused on helping answer some… Read more The post What Is The State Of Data Engineering And Infrastructure In 2023 appeared first on Seattle Data Guy.

More Trending

article thumbnail

Building Applications With Data As Code On The DataOS

Data Engineering Podcast

Summary The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery.

Coding 130
article thumbnail

Data News — Week 23.02

Christophe Blefari

Abandoned Pandas ( credits ) Hey. I have busy weeks, I'm sorry Data News are coming on Saturday again. This is a bit hard to travel by train, work and write at the same time. Plus I'm a fast context switcher, so it piles up. Also a few of you have sent me messages recently and I've not yet answered, I see you and I did not forget you.

Python 130
article thumbnail

Why You Should Simplify Your Data Infrastructure

Seattle Data Guy

Good Design Is Easier to Change Than Bad Design – The Pragmatic Programmer Programming is just one aspect of the difficulties of tech work for data engineers. Creating simple yet robust systems that help manage your data infrastructure is equally important. This challenge of building a simple yet robust data infrastructure remains even with no-code/low-code solutions.

Data 130
article thumbnail

How to Use Python and Machine Learning to Predict Football Match Winners

KDnuggets

We will be learning web scraping and training supervised machine-learning algorithms to predict winning teams.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Driving Data, Delivering Value: Data Leaders to Watch in 2023

Snowflake

The Chief Data Officer is arguably one of the most important roles at a company, particularly those that aspire to be data-driven. CDO appointments and the elevation of data leaders have accelerated in recent years, and the role has morphed as perceptions of data have evolved. Responsibilities span strategy and execution, people and processes, and the technology needed to deliver on the promise of data.

Data 111
article thumbnail

Devpod: Improving Developer Productivity at Uber with Remote Development

Uber Engineering

In this blog, we share how we improved the daily edit-build-run developer experience using DevPods, Uber’s remote development environment. We cover the challenges, pain points, our architecture, and lastly the future of remote development at Uber.

article thumbnail

Functional Python, Part II: Dial M for Monoid

Tweag

Tweagers have an engineering mantra — Functional. Typed. Immutable. — that begets composable software which can be reasoned about and avails itself to static analysis. These are all “good things” for building robust software, which inevitably lead us to using languages such as Haskell, OCaml and Rust. However, it would be remiss of us to snub languages that don’t enforce the same disciplines, but are nonetheless popular choices in industry.

Python 102
article thumbnail

ChatGPT: Everything You Need to Know

KDnuggets

All you need to know about ChatGPT: what it can do, how it works, and its limitations.

IT 144
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

New Built-in Functions for Databricks SQL

databricks

Built-in functions extend the power of SQL with specific transformations of values for common needs and use cases. For example, the LOG10 function.

SQL 98
article thumbnail

Reducing Logging Cost by Two Orders of Magnitude using CLP

Uber Engineering

Uber’s Data team discusses how they used CLP to scale log ingestion, retention, and analytics for Petabytes of Spark logs, reducing log storage and management costs by 169x.

article thumbnail

The Insurance Industry is Ready for a lot More Change

Teradata

The dwindling personal auto insurance market is a harbinger of a lot more change to come. Find out more.

article thumbnail

SQL and Data Integration: ETL and ELT

KDnuggets

In this article, we will discuss use cases and methods for using ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes along with SQL to integrate data from various sources.

SQL 129
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Easy Ingestion to Lakehouse With COPY INTO

databricks

A new data management architecture known as the data lakehouse emerged independently across many organizations and use cases to support AI and BI.

BI 98
article thumbnail

Uber’s Next Gen Push Platform on gRPC

Uber Engineering

Uber’s API platform team talks about how they built their Next Generation Push Platform on gRPC which helped improve the reliability and latency of messages significantly.

98
article thumbnail

Leveraging Snowflake to Enable Genomic Analytics at Scale

Snowflake

Genomic data, which is the DNA data of organisms, is essential to life sciences companies. For population studies, anonymized data sets can link long-term health histories with treatment patterns and genomic variations, making it possible to analyze effective approaches for subpopulations. In clinical trials and drug discovery, pharmaceutical research that combines patient health data, drug effectiveness, and genomic variations can improve outcomes and speed time to market.

article thumbnail

Fast-track your next move with in-demand data skills

KDnuggets

DataCamp offers over 400 interactive courses, projects, and career tracks in the most popular data technologies such as Python, SQL, R, Power BI, and Tableau. Start today and save up to 67% on career-advancing learning.

BI 129
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

What’s New With SQL User-Defined Functions

databricks

Since their initial release, SQL user-defined functions have become hugely popular among both Databricks Runtime and Databricks SQL customers. This simple yet powerful.

SQL 98
article thumbnail

How Uber Optimizes the Timing of Push Notifications using ML and Linear Programming

Uber Engineering

The Uber Eats team shares how they built a novel system with machine learning and linear programming to send the right message at the right time to its users.

article thumbnail

Data Integrity Trends for 2023

Precisely

For most enterprises, 2022 was a year of transition, as companies struggled to figure out how to accomplish more with fewer resources. Technology helped to bridge the gap, as AI, machine learning, and data analytics drove smarter decisions, and automation paved the way for greater efficiency. Data integrity trends for 2023, has agility toping the list of success factors for most firms, as business leaders focus on rapid time to value and an emphasis on responding quickly to emerging opportunitie

article thumbnail

Scaling Data Management Through Apache Gobblin

KDnuggets

Software companies can manage big data at a hyper-scale on different infrastructure stacks using Apache Gobblin.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How to make this 3D diorama of the Straits of Mackinac

ArcGIS

Here's one way to make these fun and intriguing micro-world cutaway sorts of things!

96
article thumbnail

Deduping and Storing Images at Uber Eats

Uber Engineering

Our engineers discuss how we dedupe and store millions of product images at Uber Eats using a content-addressable caching layer, which saves millions of image downloads every hour and ensures that every image is only stored once.

article thumbnail

New! Diversity, equity, and inclusion analysis SpotApp helps businesses improve employee diversity

ThoughtSpot

Tech has a diversity problem. As a veteran People leader, I see and hear about it all the time — in media , in the board room, and in my daily work. And yet, as much as our industry is known for solving large-scale problems and disrupting the status quo, improvement in this area doesn’t seem to be happening fast enough. Why not? When I look at companies leading our industry in DEI, there’s one thing that stands out: data.

article thumbnail

Encoding Categorical Features with MultiLabelBinarizer

KDnuggets

Transform multi-label format into a binary matrix for multi-label classification.

Process 116
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m