Sat.Jul 13, 2024 - Fri.Jul 19, 2024

article thumbnail

The software engineering industry in 2024: what changed, why, and what is next

The Pragmatic Engineer

The past 18 months have seen major change reshape the tech industry. What does it all mean for businesses and dev teams – and what will pragmatic software engineering approaches look like in the future? I tackled these burning questions in my conference talk, “What’s Old is New Again,” which was the keynote of the Craft Conference in May 2024.

article thumbnail

What are the types of data quality checks?

Start Data Engineering

1. Introduction 2. Data Quality(DQ) checks are run as part of your pipeline 2.1. Ensure your consumers don’t get incorrect data with output DQ checks 2.2. Catch upstream issues quickly with input DQ checks 2.3. Waiting a long time to run output DQ checks? Save time & money with mid-pipeline DQ checks. 2.4. Track incoming and outgoing row counts with Audit logs 3.

Data 214
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How ChatGPT is Changing the Face of Programming

KDnuggets

Empowering Developers and Transforming Programming Practices

article thumbnail

How Long Should You Train Your Language Model?

databricks

How long should you train your language model? How large should your model be? In today's generative AI landscape, these are multi-million dollar.

134
134
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

DAIS 2024: Testing framework from the Dataflow model for Apache Spark Structured Streaming

Waitingforcode

With this blog I'm starting a follow-up series for my Data+AI Summit 2024 talk. I missed this family of blog posts a lot as the previous DAIS with me as speaker was 4 years ago! As previously, this time too I'll be writing several blog posts that should help you remember the talk and also cover some of the topics left aside because of the time constraints.

Data 130
article thumbnail

Data News — Week 24.28

Christophe Blefari

EuroSeagull ( credits ) Dear members, it's been a few weeks since I did not catch you on a proper Data News with a collection of links. Here we are. This week, I attended EuroPython in Prague. While I spent most of my time at the dltHub booth in the sponsors hall, I didn't attend many talks. However, I did give a few presentations on my SQL orchestration library, yato , which pairs well with dlt.

Kafka 130

More Trending

article thumbnail

Ensuring Quality Forecasts with Databricks Lakehouse Monitoring

databricks

Discover how Databricks Lakehouse Monitoring empowers you to ensure reliable, accurate forecasts by proactively detecting data drift, model degradation, and more.

Data 126
article thumbnail

AI Lab: The secrets to keeping machine learning engineers moving fast

Engineering at Meta

The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A/B test common ML workflows – enabling proactive improvements and automatically preventing regressions on TTFB. AI Lab prevents TTFB regressions whilst enabling experimentation to develop improvements.

article thumbnail

Snowflake’s Summer of Sports and AI

Snowflake

All eyes are on sports this summer, with blockbuster events happening in everything from soccer and cycling to cricket and car racing. Snowflake is excited to join the action with a virtual “relay race,” where Snowflake sports and data experts, customers and partners will demonstrate how the sports industry can win big with data and AI. Industry leaders already know that sports runs on data analytics: from individual athlete performance and team statistics, to marketing and fan engagement, to ti

article thumbnail

7 Ways to Improve Your Machine Learning Models

KDnuggets

Tips and tricks on improving machine learning model performance on diverse and unseen datasets.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Generative AI in Urban Planning

ArcGIS

Planning a city block, a neighborhood, or maybe a whole new city is a multifaceted task with no universal recipe to use. How can Generative AI help Urban Planners?

Designing 108
article thumbnail

Announcing the General Availability of Serverless Compute for Notebooks, Workflows and Delta Live Tables

databricks

We are excited to announce the General Availability of serverless compute for notebooks, jobs and Delta Live Tables (DLT) on AWS and Azure.

AWS 107
article thumbnail

From Potential Disaster To Driver of Change… Data Execs Share Their Journeys To Effective AI

Snowflake

A potential recipe for disaster proved to be the focus of every data executive’s agenda over the last year. A year ago many data leaders were caught off-guard. Employees embraced new gen AI tools with fervor, driving interest in all AI initiatives. Generative AI had penetrated the enterprise, with gen AI positioned in the Peak Of Inflated Expectation segment on the Gartner® Hype Cycle for Artificial IntelligenceI, 2023 1.

Education 100
article thumbnail

Machine Learning Made Simple for Data Analysts with BigQuery ML

KDnuggets

Thanks to tools like BigQuery ML, you can harness the power of ML without needing a computer science degree. Let's explore how to get started.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Explore BlueBikes ride data with ArcGIS Pro Charts

ArcGIS

In this blog article, we'll explore BlueBikes data, a bike share service in bustling Boston, and uncover hidden insights through the power of visualization

Data 104
article thumbnail

Unlocking True Water Risk Assessment Worldwide

databricks

Unlocking True Water Risk Assessment Across Insurance, Finance, Public Safety, and Beyond Check out the solution accelerator to download the notebooks referred to.

Insurance 105
article thumbnail

Will Generative AI Implode and Can it Become More Sustainable? by Oliver Cronk

Scott Logic

Generative AI has a Sustainability problem Generative AI , including large language models (LLMs), has taken the world by storm. Inspired by ChatGPT, many companies are racing to implement GenAI in their projects, lured by its hyped potential to revolutionise industries. However, based on my experience of applying GenAI to enterprise implementations, I am seeing first-hand the sustainability challenges threatening to implode the first generation of this technology.

IT 98
article thumbnail

The Role of AI in Digital Marketing

KDnuggets

Artificial intelligence (AI) has revolutionized numerous sectors, including digital marketing. This field leverages online platforms to promote products and services.

143
143
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Data Engineering Weekly #180

Data Engineering Weekly

Canva: How Canva collects 25 billion events per day Canva writes about its event collection infrastructure capabilities, handling 25 billion events per day (800 billion events per month) with 99.999% uptime. At our team’s inception, a key decision we made, one we still believe to be a big part of our success, was that every collected event must have a machine-readable, well-documented schema.

article thumbnail

Generative AI Use Case: Using LLMs to Score Customer Conversations

Monte Carlo

Despite all the talk about AI replacing humans, Skynet blowing up the sun, and deep-fake celebrities parenting our children, it’s difficult to point to a generative AI use case that it’s demonstrably more interesting than your average run-of-the-mill chatbot. But what if instead of replacing customer support teams with chatbots, we could leverage AI to improve the performance of real human CS teams?

article thumbnail

Navigating the LLM Landscape: Uber’s Innovation with GenAI Gateway

Uber Engineering

Uber elevates tech with the GenAI Gateway, integrating Large Language Models (LLMs) for 60+ use cases, from automation to customer support. This unified platform offers easy access to models from OpenAI, Vertex AI, and Uber’s own, ensuring efficiency and security.

article thumbnail

Describing Data: A Statology Primer

KDnuggets

This collection of tutorials on describing data comes from our sister site Statology.

Data 139
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Meet Caddy – Meta’s next-gen mixed reality CAD software

Engineering at Meta

What happens when a team of mechanical engineers get tired of looking at flat images of 3D models over Zoom? Meet the team behind Caddy, a new CAD app for mixed reality. They join Pascal Hartig ( @passy ) on the Meta Tech Podcast to talk about teaching themselves to code, disrupting the CAD software space, and how they integrated Caddy with Llama 3, and so much more!

article thumbnail

Celebrating Our Canada Office Opening

Robinhood

Robinhood was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood is lowering barriers and providing greater access to financial information and investing. Together, we are building products and services that help create a financial system everyone can participate in. … Several members of our engineering, security, corporate engineering, and recruiting teams were recently in Toronto for our office opening in ea

article thumbnail

Data Modeling Techniques for the Post-Modern Data Stack

Towards Data Science

A set of generic techniques and principles to design a robust, cost-efficient, and scalable data model for your post-modern data stack.

article thumbnail

Convert Bytes to String in Python: A Tutorial for Beginners

KDnuggets

Strings are common built-in data types in Python. But sometimes, you may need to work with bytes instead. Let’s learn how to convert bytes to string in Python.

Bytes 138
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

End-to-end test probes with Playwright

Zalando Engineering

Why automated end-to-end tests? What are automated end-to-end tests? Do you need them at all? In this blog post we dive into the ugly behind automated end-to-end testing, what we struggled with at Zalando, what worked well for us and our latest solution with end-to-end test probes. Automated end-to-end tests continue to polarise the industry, with some leaders advocating for them and others rightfully questioning their return on investments and recommending to invest in monitoring and alerting s

Coding 74
article thumbnail

Safeguarding App Health and Consumer Experience with Metric-Aware Rollouts

DoorDash Engineering

As part of our ongoing efforts to enhance product development while safeguarding app health and the consumer experience, we are introducing metric-aware rollouts for experiments. Metric-aware rollouts refer to established decision rules to flag issues with automated checks on standardized app quality metrics during the new feature rollout process. Every action DoorDash takes focuses on enhancing the consumer experience.

article thumbnail

PySpark Explained: User-Defined Functions

Towards Data Science

What are they, and how do you use them?

article thumbnail

A Beginner’s Guide to PyTorch

KDnuggets

learn one of the most important Python packages to improve your career.

Python 132
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m