July, 2024

article thumbnail

My Obsidian Note-Taking Workflow

Simon Späti

A Vim-Inspired Approach to Efficient Note Management with Obsidian and Markdown

article thumbnail

The software engineering industry in 2024: what changed, why, and what is next

The Pragmatic Engineer

The past 18 months have seen major change reshape the tech industry. What does it all mean for businesses and dev teams – and what will pragmatic software engineering approaches look like in the future? I tackled these burning questions in my conference talk, “What’s Old is New Again,” which was the keynote of the Craft Conference in May 2024.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

What are the types of data quality checks?

Start Data Engineering

1. Introduction 2. Data Quality(DQ) checks are run as part of your pipeline 2.1. Ensure your consumers don’t get incorrect data with output DQ checks 2.2. Catch upstream issues quickly with input DQ checks 2.3. Waiting a long time to run output DQ checks? Save time & money with mid-pipeline DQ checks. 2.4. Track incoming and outgoing row counts with Audit logs 3.

Data 214
article thumbnail

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

Confessions of a Data Guy

I’ve had something rattling around in the old noggin for a while; it’s just another strange idea that I can’t quite shake out. We all keep hearing about Arrow this and Arrow that … seems every new tool built today for Data Engineering seems to be at least partly based on Arrow’s in-memory format. So, […] The post PyArrow vs Polars (vs DuckDB) for Data Pipelines. appeared first on Confessions of a Data Guy.

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

5 Tools Every Data Scientist Needs in Their Toolbox in 2024

KDnuggets

From the soft tools to the hard tools, these are what make a data scientist successful.

Data 153
article thumbnail

A New Standard in Open Source AI: Meta Llama 3.1 on Databricks

databricks

We are excited to partner with Meta to release the Llama 3.1 series of models on Databricks, further advancing the standard of powerful.

145
145

More Trending

article thumbnail

Introducing Apache Kafka® 3.8

Confluent

Apache Kafka 3.8 adds 17 new KIPs (13 for Core, 3 for Streams & 1 for Connect). Highlights include 2 new Docker images, the ability to set task assignors, and more!

Kafka 136
article thumbnail

How to implement data quality checks with greatexpectations

Start Data Engineering

1. Introduction 2. Project overview 3. Check your data before making it available to end-users; Write-Audit-Publish(WAP) pattern 4. TL;DR: How the greatexpectations library works 4.1. greatexpectations quick setup 5. From an implementation perspective, there are four types of tests 5.1. Running checks on one dataset 5.2. Checks involving the current dataset and its historical data 5.3.

Datasets 208
article thumbnail

Robinhood Acquires Pluto, AI Investment Research Platform

Robinhood

Robinhood Markets, Inc. is excited to announce the acquisition of Pluto Capital Inc., an artificial intelligence (AI) powered investment research platform that delivers highly-customized investment strategies based on customer needs and financial goals. With this strategic acquisition, investors can look forward to a new era of intelligent, data-driven investing at Robinhood.

Portfolio 135
article thumbnail

How ChatGPT is Changing the Face of Programming

KDnuggets

Empowering Developers and Transforming Programming Practices

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Enhancing LLM-as-a-Judge with Grading Notes

databricks

Evaluating long-form LLM outputs quickly and accurately is critical for rapid AI development. As a result, many developers wish to deploy LLM-as-judge methods.

143
143
article thumbnail

Snowflake Cortex Search: State-of-the-Art Hybrid Search for RAG Applications

Snowflake

Snowflake Cortex Search, a fully managed search service for documents and other unstructured data, is now in public preview. With Cortex Search, organizations can effortlessly deploy retrieval-augmented generation (RAG) applications with Snowflake, powering use cases like customer service, financial research and sales chatbots. Cortex Search offers state-of-the-art semantic and lexical search over your text data in Snowflake behind an intuitive user interface, and it comes with the robust securi

article thumbnail

Data News — Week 24.30

Christophe Blefari

Tallinn ( credits ) Dear members, it's Summer Data News, the only news you can consume by the pool, the beach or at the office—if you're not lucky. This week, I'm writing from the Baltics, nomading a bit in Eastern and Northern Europe. I'm pleased to announce that we have successfully closed the CfP for Forward Data Conf, we received nearly 100 submissions and the program committee is currently reviewing all submissions.

MySQL 130
article thumbnail

DAIS 2024: Testing framework from the Dataflow model for Apache Spark Structured Streaming

Waitingforcode

With this blog I'm starting a follow-up series for my Data+AI Summit 2024 talk. I missed this family of blog posts a lot as the previous DAIS with me as speaker was 4 years ago! As previously, this time too I'll be writing several blog posts that should help you remember the talk and also cover some of the topics left aside because of the time constraints.

Data 130
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

9 Habits Of Effective Data Managers – Running A Data Team

Seattle Data Guy

Running a successful data team is hard. Data teams are expected to juggle a combination of ad-hoc requests, big bet projects, migrations, etc. All while keeping up with the latest changes in technology. In the past few years I have gotten to work with dozens of teams and see how various directors and managers deal… Read more The post 9 Habits Of Effective Data Managers – Running A Data Team appeared first on Seattle Data Guy.

article thumbnail

Landing a Data Engineer Role: Free Courses and Certifications

KDnuggets

Is it possible to learn data engineering for free? I claim it is and present the evidence for that in the form of 10 free data engineering courses.

article thumbnail

Announcing Mosaic AI Agent Framework and Agent Evaluation

databricks

Databricks announced the public preview of Mosaic AI Agent Framework & Agent Evaluation alongside our Generative AI Cookbook at the Data + AI.

Data 142
article thumbnail

SQL or Python for Data Transformations?

Start Data Engineering

1. Introduction 2. Code is an interface to the execution engine 3. How to choose the execution engine and the coding interface 3.1. Chose execution engine based on your workload 3.1.1. Types of execution engine 3.1.2. Criteria to chose your execution engine 3.2. Chose coding interface for people who will maintain the pipeline 3.2.1. Types of coding interfaces 3.2.2.

SQL 130
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Data News — Week 24.28

Christophe Blefari

EuroSeagull ( credits ) Dear members, it's been a few weeks since I did not catch you on a proper Data News with a collection of links. Here we are. This week, I attended EuroPython in Prague. While I spent most of my time at the dltHub booth in the sponsors hall, I didn't attend many talks. However, I did give a few presentations on my SQL orchestration library, yato , which pairs well with dlt.

Kafka 130
article thumbnail

Data+AI Summit 2024 - Retrospective - Streaming

Waitingforcode

Welcome to the first Data+AI Summit 2024 retrospective blog post. I'm opening the series with the topic close to my heart at the moment, stream processing!

Data 130
article thumbnail

How to make a “peeled edge” area of interest effect in ArcGIS Pro

ArcGIS

Catch eyes and imaginations with this fun technique that draws attention to your area of interest with a bit of style!

127
127
article thumbnail

Tools Every Data Scientist Should Know: A Practical Guide

KDnuggets

Discover the essential tools every data scientist should know to elevate their data science game, from Python and R to SQL and advanced visualization tools.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Introducing Mosaic AI Model Training for Fine-Tuning GenAI Models

databricks

Today, we're thrilled to announce that Mosaic AI Model Training's support for fine-tuning GenAI models is now available in Public Preview. At Databricks.

article thumbnail

New with Confluent Platform: Enhanced security with OAuth Support, Confluent Platform for Apache Flink® (LA), a new Connector, and More

Confluent

Confluent Platform 7.

121
121
article thumbnail

AI Lab: The secrets to keeping machine learning engineers moving fast

Engineering at Meta

The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A/B test common ML workflows – enabling proactive improvements and automatically preventing regressions on TTFB. AI Lab prevents TTFB regressions whilst enabling experimentation to develop improvements.

article thumbnail

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Snowflake

A robust, modern data platform is the starting point for your organization’s data and analytics vision. At first, you may use your modern data platform as a single source of truth to realize operational gains — but you can realize far greater benefits by adding additional use cases. In this blog, we offer guidance for leveraging Snowflake’s capabilities around data and AI to build apps and unlock innovation.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Create a Digital Twin in Seven Days with ArcGIS

ArcGIS

Creation of a Digital Twin in Seven Days with ArcGIS in Zurich

article thumbnail

Building Data Science Pipelines Using Pandas

KDnuggets

Learn to build the end-to-end data science pipelines from data ingestion to data visualization using Pandas pipe method.

article thumbnail

Databricks on Databricks: Kicking off the Journey to Governance with Unity Catalog

databricks

In this blog, we are excited to share Databricks's journey in migrating to Unity Catalog for enhanced data governance. We'll discuss our high-level strategy and the tools we developed to facilitate the migration. Our goal is to highlight the benefits of Unity Catalog and make you feel confident about transitioning to it.

article thumbnail

Data Engineering Weekly #181

Data Engineering Weekly

Editor’s Note: A New Series on Data Engineering Tools Evaluation There are plenty of data tools and vendors in the industry. But how can we choose a tool for the specific need? The traditional evaluation of running PoC on all the selected vendor tools is time-consuming and practically unviable for growth-driven companies. Data Engineering Weekly is launching a new series on software evaluation focused on data engineering to better guide data engineering leaders in evaluating data tools.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you