Sat.Jul 13, 2024 - Fri.Jul 19, 2024

article thumbnail

What are the types of data quality checks?

Start Data Engineering

1. Introduction 2. Data Quality(DQ) checks are run as part of your pipeline 2.1. Ensure your consumers don’t get incorrect data with output DQ checks 2.2. Catch upstream issues quickly with input DQ checks 2.3. Waiting a long time to run output DQ checks? Save time & money with mid-pipeline DQ checks. 2.4. Track incoming and outgoing row counts with Audit logs 3.

Data 214
article thumbnail

The software engineering industry in 2024: what changed, why, and what is next

The Pragmatic Engineer

The past 18 months have seen major change reshape the tech industry. What does it all mean for businesses and dev teams – and what will pragmatic software engineering approaches look like in the future? I tackled these burning questions in my conference talk, “What’s Old is New Again,” which was the keynote of the Craft Conference in May 2024.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

DAIS 2024: Testing framework from the Dataflow model for Apache Spark Structured Streaming

Waitingforcode

With this blog I'm starting a follow-up series for my Data+AI Summit 2024 talk. I missed this family of blog posts a lot as the previous DAIS with me as speaker was 4 years ago! As previously, this time too I'll be writing several blog posts that should help you remember the talk and also cover some of the topics left aside because of the time constraints.

Data 130
article thumbnail

Data News — Week 24.28

Christophe Blefari

EuroSeagull ( credits ) Dear members, it's been a few weeks since I did not catch you on a proper Data News with a collection of links. Here we are. This week, I attended EuroPython in Prague. While I spent most of my time at the dltHub booth in the sponsors hall, I didn't attend many talks. However, I did give a few presentations on my SQL orchestration library, yato , which pairs well with dlt.

Kafka 130
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Landing a Data Engineer Role: Free Courses and Certifications

KDnuggets

Is it possible to learn data engineering for free? I claim it is and present the evidence for that in the form of 10 free data engineering courses.

article thumbnail

AI Lab: The secrets to keeping machine learning engineers moving fast

Engineering at Meta

The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A/B test common ML workflows – enabling proactive improvements and automatically preventing regressions on TTFB. AI Lab prevents TTFB regressions whilst enabling experimentation to develop improvements.

More Trending

article thumbnail

Data Engineering Weekly #180

Data Engineering Weekly

Canva: How Canva collects 25 billion events per day Canva writes about its event collection infrastructure capabilities, handling 25 billion events per day (800 billion events per month) with 99.999% uptime. At our team’s inception, a key decision we made, one we still believe to be a big part of our success, was that every collected event must have a machine-readable, well-documented schema.

article thumbnail

The Role of AI in Digital Marketing

KDnuggets

Artificial intelligence (AI) has revolutionized numerous sectors, including digital marketing. This field leverages online platforms to promote products and services.

120
120
article thumbnail

Explore BlueBikes ride data with ArcGIS Pro Charts

ArcGIS

In this blog article, we'll explore BlueBikes data, a bike share service in bustling Boston, and uncover hidden insights through the power of visualization

Data 103
article thumbnail

Ensuring Quality Forecasts with Databricks Lakehouse Monitoring

databricks

Discover how Databricks Lakehouse Monitoring empowers you to ensure reliable, accurate forecasts by proactively detecting data drift, model degradation, and more.

Data 100
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Snowflake’s Summer of Sports and AI

Snowflake

All eyes are on sports this summer, with blockbuster events happening in everything from soccer and cycling to cricket and car racing. Snowflake is excited to join the action with a virtual “relay race,” where Snowflake sports and data experts, customers and partners will demonstrate how the sports industry can win big with data and AI. Industry leaders already know that sports runs on data analytics: from individual athlete performance and team statistics, to marketing and fan engagement, to ti

article thumbnail

Machine Learning Made Simple for Data Analysts with BigQuery ML

KDnuggets

Thanks to tools like BigQuery ML, you can harness the power of ML without needing a computer science degree. Let's explore how to get started.

article thumbnail

Generative AI in Urban Planning

ArcGIS

Planning a city block, a neighborhood, or maybe a whole new city is a multifaceted task with no universal recipe to use. How can Generative AI help Urban Planners?

article thumbnail

Unlocking True Water Risk Assessment Worldwide

databricks

Unlocking True Water Risk Assessment Across Insurance, Finance, Public Safety, and Beyond Check out the solution accelerator to download the notebooks referred to.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Meet Caddy – Meta’s next-gen mixed reality CAD software

Engineering at Meta

What happens when a team of mechanical engineers get tired of looking at flat images of 3D models over Zoom? Meet the team behind Caddy, a new CAD app for mixed reality. They join Pascal Hartig ( @passy ) on the Meta Tech Podcast to talk about teaching themselves to code, disrupting the CAD software space, and how they integrated Caddy with Llama 3, and so much more!

article thumbnail

Convert Bytes to String in Python: A Tutorial for Beginners

KDnuggets

Strings are common built-in data types in Python. But sometimes, you may need to work with bytes instead. Let’s learn how to convert bytes to string in Python.

Bytes 113
article thumbnail

Navigating the LLM Landscape: Uber’s Innovation with GenAI Gateway

Uber Engineering

Uber elevates tech with the GenAI Gateway, integrating Large Language Models (LLMs) for 60+ use cases, from automation to customer support. This unified platform offers easy access to models from OpenAI, Vertex AI, and Uber’s own, ensuring efficiency and security.

article thumbnail

Will Generative AI Implode and Can it Become More Sustainable? by Oliver Cronk

Scott Logic

Generative AI has a Sustainability problem Generative AI , including large language models (LLMs), has taken the world by storm. Inspired by ChatGPT, many companies are racing to implement GenAI in their projects, lured by its hyped potential to revolutionise industries. However, based on my experience of applying GenAI to enterprise implementations, I am seeing first-hand the sustainability challenges threatening to implode the first generation of this technology.

IT 80
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

End-to-end test probes with Playwright

Zalando Engineering

Why automated end-to-end tests? What are automated end-to-end tests? Do you need them at all? In this blog post we dive into the ugly behind automated end-to-end testing, what we struggled with at Zalando, what worked well for us and our latest solution with end-to-end test probes. Automated end-to-end tests continue to polarise the industry, with some leaders advocating for them and others rightfully questioning their return on investments and recommending to invest in monitoring and alerting s

Coding 73
article thumbnail

Testing Like a Pro: A Step-by-Step Guide to Python’s Mock Library

KDnuggets

Explore Python's mock library for seamless testing—replace real objects with mocks, perfect for isolating and verifying your code's behavior.

Python 108
article thumbnail

Announcing the General Availability of Serverless Compute for Notebooks, Workflows and Delta Live Tables

databricks

We are excited to announce the General Availability of serverless compute for notebooks, jobs and Delta Live Tables (DLT) on AWS and Azure.

AWS 89
article thumbnail

City of Hope Redefines Predictive Sepsis Detection Using Kafka

Confluent

City of Hope’s AI models for predicting and preventing sepsis in bone-marrow transplant patients rely on real-time data, enabled by Kafka on Confluent Cloud.

Kafka 69
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Understanding Develocity Build Data with Honeycomb

Pinterest Engineering

David Chang; Staff Software Engineer | Develocity, formerly known as Gradle Enterprise, is a powerful tool that speeds up local and CI build time, helps troubleshoot your builds, and analyzes your data. At Pinterest, we have a dedicated team, Mobile Builds, and we ensure that developers can build fast and often. This enables developers to be more productive by getting faster feedback on their code.

article thumbnail

7 Ways to Improve Your Machine Learning Models

KDnuggets

Tips and tricks on improving machine learning model performance on diverse and unseen datasets.

article thumbnail

Nautical chart creation is versatile with ArcGIS Maritime

ArcGIS

Unlock the potential of ArcGIS Maritime for efficient maritime operations. Optimize navigation, ensure accuracy, and make data-driven decisions.

Data 61
article thumbnail

PMP Certification Requirements: Boost Your Management Career with PMP

Edureka

Ready to supercharge your project management career? This article will highlight all the essential PMP certification requirements. Expect clear and actionable insights on what it takes to become a certified Project Management Professional. Whether you’re a seasoned manager or an aspiring leader, discover how PMP can open doors to new opportunities.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Modern Data Management Essentials: Exploring Data Fabric

Precisely

Key Takeaways Data Fabric is a modern data architecture that facilitates seamless data access, sharing, and management across an organization. Data management recommendations and data products emerge dynamically from the fabric through automation, activation, and AI/ML analysis of metadata. While data fabric is not a standalone solution, critical capabilities that you can address today to prepare for a data fabric include automated data integration, metadata management, centralized data governan

article thumbnail

How ChatGPT is Changing the Face of Programming

KDnuggets

Empowering Developers and Transforming Programming Practices

article thumbnail

Will Generative AI Implode and Can it Become More Sustainable? by Oliver Cronk

Scott Logic

Generative AI has a Sustainability problem Generative AI , including large language models (LLMs), has taken the world by storm. Inspired by ChatGPT many companies are racing to implement GenAI in their projects, lured by its hyped potential to revolutionise industries. However from experience of applying GenAI to enterprise implementations, I am seeing firsthand the sustainability challenges threatening to implode the first generation of this technology.

IT 52
article thumbnail

A Deep Dive into Data Lakehouses

Hevo

The term “Data Lakehouse” is quite common nowadays. The new concept promises to address the failures of data warehouses and data lakes and help support the workloads of both business intelligence and machine learning. But what exactly is a data lakehouse, and how is it different from a data warehouse or a data lake?

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.