Sat.Jul 13, 2024 - Fri.Jul 19, 2024

article thumbnail

What are the types of data quality checks?

Start Data Engineering

1. Introduction 2. Data Quality(DQ) checks are run as part of your pipeline 2.1. Ensure your consumers don’t get incorrect data with output DQ checks 2.2. Catch upstream issues quickly with input DQ checks 2.3. Waiting a long time to run output DQ checks? Save time & money with mid-pipeline DQ checks. 2.4. Track incoming and outgoing row counts with Audit logs 3.

Data 214
article thumbnail

The software engineering industry in 2024: what changed, why, and what is next

The Pragmatic Engineer

The past 18 months have seen major change reshape the tech industry. What does it all mean for businesses and dev teams – and what will pragmatic software engineering approaches look like in the future? I tackled these burning questions in my conference talk, “What’s Old is New Again,” which was the keynote of the Craft Conference in May 2024.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

DAIS 2024: Testing framework from the Dataflow model for Apache Spark Structured Streaming

Waitingforcode

With this blog I'm starting a follow-up series for my Data+AI Summit 2024 talk. I missed this family of blog posts a lot as the previous DAIS with me as speaker was 4 years ago! As previously, this time too I'll be writing several blog posts that should help you remember the talk and also cover some of the topics left aside because of the time constraints.

Data 130
article thumbnail

Data News — Week 24.28

Christophe Blefari

EuroSeagull ( credits ) Dear members, it's been a few weeks since I did not catch you on a proper Data News with a collection of links. Here we are. This week, I attended EuroPython in Prague. While I spent most of my time at the dltHub booth in the sponsors hall, I didn't attend many talks. However, I did give a few presentations on my SQL orchestration library, yato , which pairs well with dlt.

Kafka 130
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

The Role of AI in Digital Marketing

KDnuggets

Artificial intelligence (AI) has revolutionized numerous sectors, including digital marketing. This field leverages online platforms to promote products and services.

141
141
article thumbnail

How Long Should You Train Your Language Model?

databricks

How long should you train your language model? How large should your model be? In today's generative AI landscape, these are multi-million dollar.

122
122

More Trending

article thumbnail

AI Lab: The secrets to keeping machine learning engineers moving fast

Engineering at Meta

The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A/B test common ML workflows – enabling proactive improvements and automatically preventing regressions on TTFB. AI Lab prevents TTFB regressions whilst enabling experimentation to develop improvements.

article thumbnail

Landing a Data Engineer Role: Free Courses and Certifications

KDnuggets

Is it possible to learn data engineering for free? I claim it is and present the evidence for that in the form of 10 free data engineering courses.

article thumbnail

Ensuring Quality Forecasts with Databricks Lakehouse Monitoring

databricks

Discover how Databricks Lakehouse Monitoring empowers you to ensure reliable, accurate forecasts by proactively detecting data drift, model degradation, and more.

Data 106
article thumbnail

From Potential Disaster To Driver of Change… Data Execs Share Their Journeys To Effective AI

Snowflake

A potential recipe for disaster proved to be the focus of every data executive’s agenda over the last year. A year ago many data leaders were caught off-guard. Employees embraced new gen AI tools with fervor, driving interest in all AI initiatives. Generative AI had penetrated the enterprise, with gen AI positioned in the Peak Of Inflated Expectation segment on the Gartner® Hype Cycle for Artificial IntelligenceI, 2023 1.

Education 104
article thumbnail

Launching LLM-Based Products: From Concept to Cash in 90 Days

Speaker: Christophe Louvion, Chief Product & Technology Officer of NRC Health and Tony Karrer, CTO at Aggregage

Christophe Louvion, Chief Product & Technology Officer of NRC Health, is here to take us through how he guided his company's recent experience of getting from concept to launch and sales of products within 90 days. In this exclusive webinar, Christophe will cover key aspects of his journey, including: LLM Development & Quick Wins 🤖 Understand how LLMs differ from traditional software, identifying opportunities for rapid development and deployment.

article thumbnail

Data Engineering Weekly #180

Data Engineering Weekly

Canva: How Canva collects 25 billion events per day Canva writes about its event collection infrastructure capabilities, handling 25 billion events per day (800 billion events per month) with 99.999% uptime. At our team’s inception, a key decision we made, one we still believe to be a big part of our success, was that every collected event must have a machine-readable, well-documented schema.

article thumbnail

Machine Learning Made Simple for Data Analysts with BigQuery ML

KDnuggets

Thanks to tools like BigQuery ML, you can harness the power of ML without needing a computer science degree. Let's explore how to get started.

article thumbnail

Unlocking True Water Risk Assessment Worldwide

databricks

Unlocking True Water Risk Assessment Across Insurance, Finance, Public Safety, and Beyond Check out the solution accelerator to download the notebooks referred to.

article thumbnail

Navigating the LLM Landscape: Uber’s Innovation with GenAI Gateway

Uber Engineering

Uber elevates tech with the GenAI Gateway, integrating Large Language Models (LLMs) for 60+ use cases, from automation to customer support. This unified platform offers easy access to models from OpenAI, Vertex AI, and Uber’s own, ensuring efficiency and security.

article thumbnail

How To Speak The Language Of Financial Success In Product Management

Speaker: Jamie Bernard

Success in product management goes beyond delivering great features - it’s about achieving measurable financial outcomes that resonate across the organization. By connecting your product’s journey with the company’s financial success, you’ll ensure that every feature, release, and innovation contributes to the bottom line, driving both customer satisfaction and business growth.

article thumbnail

Explore BlueBikes ride data with ArcGIS Pro Charts

ArcGIS

In this blog article, we'll explore BlueBikes data, a bike share service in bustling Boston, and uncover hidden insights through the power of visualization

Data 89
article thumbnail

Convert Bytes to String in Python: A Tutorial for Beginners

KDnuggets

Strings are common built-in data types in Python. But sometimes, you may need to work with bytes instead. Let’s learn how to convert bytes to string in Python.

Bytes 136
article thumbnail

Meet Caddy – Meta’s next-gen mixed reality CAD software

Engineering at Meta

What happens when a team of mechanical engineers get tired of looking at flat images of 3D models over Zoom? Meet the team behind Caddy, a new CAD app for mixed reality. They join Pascal Hartig ( @passy ) on the Meta Tech Podcast to talk about teaching themselves to code, disrupting the CAD software space, and how they integrated Caddy with Llama 3, and so much more!

Coding 83
article thumbnail

Will Generative AI Implode and Can it Become More Sustainable? by Oliver Cronk

Scott Logic

Generative AI has a Sustainability problem Generative AI , including large language models (LLMs), has taken the world by storm. Inspired by ChatGPT, many companies are racing to implement GenAI in their projects, lured by its hyped potential to revolutionise industries. However, based on my experience of applying GenAI to enterprise implementations, I am seeing first-hand the sustainability challenges threatening to implode the first generation of this technology.

IT 80
article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.

article thumbnail

Generative AI in Urban Planning

ArcGIS

Planning a city block, a neighborhood, or maybe a whole new city is a multifaceted task with no universal recipe to use. How can Generative AI help Urban Planners?

article thumbnail

Testing Like a Pro: A Step-by-Step Guide to Python’s Mock Library

KDnuggets

Explore Python's mock library for seamless testing—replace real objects with mocks, perfect for isolating and verifying your code's behavior.

Python 129
article thumbnail

Announcing the General Availability of Serverless Compute for Notebooks, Workflows and Delta Live Tables

databricks

We are excited to announce the General Availability of serverless compute for notebooks, jobs and Delta Live Tables (DLT) on AWS and Azure.

AWS 93
article thumbnail

End-to-end test probes with Playwright

Zalando Engineering

Why automated end-to-end tests? What are automated end-to-end tests? Do you need them at all? In this blog post we dive into the ugly behind automated end-to-end testing, what we struggled with at Zalando, what worked well for us and our latest solution with end-to-end test probes. Automated end-to-end tests continue to polarise the industry, with some leaders advocating for them and others rightfully questioning their return on investments and recommending to invest in monitoring and alerting s

Coding 74
article thumbnail

Provide Real Value in Your Applications with Data and Analytics

The complexity of financial data, the need for real-time insight, and the demand for user-friendly visualizations can seem daunting when it comes to analytics - but there is an easier way. With Logi Symphony, we aim to turn these challenges into opportunities. Our platform empowers you to seamlessly integrate advanced data analytics, generative AI, data visualization, and pixel-perfect reporting into your applications, transforming raw data into actionable insights.

article thumbnail

City of Hope Redefines Predictive Sepsis Detection Using Kafka

Confluent

City of Hope’s AI models for predicting and preventing sepsis in bone-marrow transplant patients rely on real-time data, enabled by Kafka on Confluent Cloud.

Kafka 64
article thumbnail

7 Ways to Improve Your Machine Learning Models

KDnuggets

Tips and tricks on improving machine learning model performance on diverse and unseen datasets.

article thumbnail

Understanding Develocity Build Data with Honeycomb

Pinterest Engineering

David Chang; Staff Software Engineer | Develocity, formerly known as Gradle Enterprise, is a powerful tool that speeds up local and CI build time, helps troubleshoot your builds, and analyzes your data. At Pinterest, we have a dedicated team, Mobile Builds, and we ensure that developers can build fast and often. This enables developers to be more productive by getting faster feedback on their code.

article thumbnail

Nautical chart creation is versatile with ArcGIS Maritime

ArcGIS

Unlock the potential of ArcGIS Maritime for efficient maritime operations. Optimize navigation, ensure accuracy, and make data-driven decisions.

Data 60
article thumbnail

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Speaker: Maher Hanafi, VP of Engineering at Betterworks & Tony Karrer, CTO at Aggregage

Executive leaders and board members are pushing their teams to adopt Generative AI to gain a competitive edge, save money, and otherwise take advantage of the promise of this new era of artificial intelligence. There's no question that it is challenging to figure out where to focus and how to advance when it’s a new field that is evolving everyday. 💡 This new webinar featuring Maher Hanafi, VP of Engineering at Betterworks, will explore a practical framework to transform Generative AI pr

article thumbnail

PMP Certification Requirements: Boost Your Management Career with PMP

Edureka

Ready to supercharge your project management career? This article will highlight all the essential PMP certification requirements. Expect clear and actionable insights on what it takes to become a certified Project Management Professional. Whether you’re a seasoned manager or an aspiring leader, discover how PMP can open doors to new opportunities.

article thumbnail

Describing Data: A Statology Primer

KDnuggets

This collection of tutorials on describing data comes from our sister site Statology.

Data 138
article thumbnail

Modern Data Management Essentials: Exploring Data Fabric

Precisely

Key Takeaways Data Fabric is a modern data architecture that facilitates seamless data access, sharing, and management across an organization. Data management recommendations and data products emerge dynamically from the fabric through automation, activation, and AI/ML analysis of metadata. While data fabric is not a standalone solution, critical capabilities that you can address today to prepare for a data fabric include automated data integration, metadata management, centralized data governan

article thumbnail

Will Generative AI Implode and Can it Become More Sustainable? by Oliver Cronk

Scott Logic

Generative AI has a Sustainability problem Generative AI , including large language models (LLMs), has taken the world by storm. Inspired by ChatGPT many companies are racing to implement GenAI in their projects, lured by its hyped potential to revolutionise industries. However from experience of applying GenAI to enterprise implementations, I am seeing firsthand the sustainability challenges threatening to implode the first generation of this technology.

IT 52
article thumbnail

The AI Superhero Approach to Product Management

Speaker: Conrado Morlan

In this engaging and witty talk, industry expert Conrado Morlan will explore how artificial intelligence can transform the daily tasks of product managers into streamlined, efficient processes. Using the lens of a superhero narrative, he’ll uncover how AI can be the ultimate sidekick, aiding in data management and reporting, enhancing productivity, and boosting innovation.