Sat.Jul 20, 2024 - Fri.Jul 26, 2024

article thumbnail

How to implement data quality checks with greatexpectations

Start Data Engineering

1. Introduction 2. Project overview 3. Check your data before making it available to end-users; Write-Audit-Publish(WAP) pattern 4. TL;DR: How the greatexpectations library works 4.1. greatexpectations quick setup 5. From an implementation perspective, there are four types of tests 5.1. Running checks on one dataset 5.2. Checks involving the current dataset and its historical data 5.3.

Datasets 208
article thumbnail

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

Confessions of a Data Guy

I’ve had something rattling around in the old noggin for a while; it’s just another strange idea that I can’t quite shake out. We all keep hearing about Arrow this and Arrow that … seems every new tool built today for Data Engineering seems to be at least partly based on Arrow’s in-memory format. So, […] The post PyArrow vs Polars (vs DuckDB) for Data Pipelines. appeared first on Confessions of a Data Guy.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data News — Week 24.30

Christophe Blefari

Tallinn ( credits ) Dear members, it's Summer Data News, the only news you can consume by the pool, the beach or at the office—if you're not lucky. This week, I'm writing from the Baltics, nomading a bit in Eastern and Northern Europe. I'm pleased to announce that we have successfully closed the CfP for Forward Data Conf, we received nearly 100 submissions and the program committee is currently reviewing all submissions.

MySQL 130
article thumbnail

Databricks on Databricks: Kicking off the Journey to Governance with Unity Catalog

databricks

In this blog, we are excited to share Databricks's journey in migrating to Unity Catalog for enhanced data governance. We'll discuss our high-level strategy and the tools we developed to facilitate the migration. Our goal is to highlight the benefits of Unity Catalog and make you feel confident about transitioning to it.

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Bayesian Thinking in Modern Data Science

KDnuggets

Discover how Bayesian thinking transforms decision-making with its unique approach to updating initial beliefs with new evidence.

article thumbnail

Data Engineering Weekly #181

Data Engineering Weekly

Editor’s Note: A New Series on Data Engineering Tools Evaluation There are plenty of data tools and vendors in the industry. But how can we choose a tool for the specific need? The traditional evaluation of running PoC on all the selected vendor tools is time-consuming and practically unviable for growth-driven companies. Data Engineering Weekly is launching a new series on software evaluation focused on data engineering to better guide data engineering leaders in evaluating data tools.

More Trending

article thumbnail

A New Standard in Open Source AI: Meta Llama 3.1 on Databricks

databricks

We are excited to partner with Meta to release the Llama 3.1 series of models on Databricks, further advancing the standard of powerful.

139
139
article thumbnail

Learn Data Analysis with Julia

KDnuggets

Setup the environment, load the data, perform data analysis and visualization, and create the data pipeline all using Julia programming language.

article thumbnail

Introducing Joint Investing Accounts at Robinhood

Robinhood

Today, we are excited to launch joint investing accounts, which allow customers to seamlessly manage investments with their partner while keeping their shared assets in one place. Joint accounts make investing more collaborative for families and loved ones, providing shared access for account holders that allows them to combine funds and increase their investment power as they work towards their financial goals.

Banking 84
article thumbnail

Zero Downtime Upgrades – Redefining Your Platform Upgrade Experience

Cloudera

Cloudera recently unveiled the latest version of Cloudera Private Cloud Base with the Zero Downtime Upgrade (ZDU) feature to enhance your user experience. The goal of ZDU is to make upgrades simpler for you and your stakeholders by increasing the availability of Cloudera’s services. How Do You Keep IT Infrastructure (and Buses) Running and Avoid Downtime?

article thumbnail

Launching LLM-Based Products: From Concept to Cash in 90 Days

Speaker: Christophe Louvion, Chief Product & Technology Officer of NRC Health and Tony Karrer, CTO at Aggregage

Christophe Louvion, Chief Product & Technology Officer of NRC Health, is here to take us through how he guided his company's recent experience of getting from concept to launch and sales of products within 90 days. In this exclusive webinar, Christophe will cover key aspects of his journey, including: LLM Development & Quick Wins 🤖 Understand how LLMs differ from traditional software, identifying opportunities for rapid development and deployment.

article thumbnail

Enhancing LLM-as-a-Judge with Grading Notes

databricks

Evaluating long-form LLM outputs quickly and accurately is critical for rapid AI development. As a result, many developers wish to deploy LLM-as-judge methods.

128
128
article thumbnail

Visualizing Data: A Statology Primer

KDnuggets

This collection of tutorials from our sister site Statology center on data visualization. Learn more about visualizing your data right here.

Data 129
article thumbnail

Accelerate your data streaming journey with the latest in Confluent Cloud

Confluent

CC 2024 Q2 adds Flink Private Networking (AWS), Flink SQL Interactive Tables; Enterprise:Connect w/Confluent, Connector Custom Offsets; SI: Build w/Confluent, etc.

Cloud 64
article thumbnail

Pickup in 3 minutes: Uber’s implementation of Live Activity on iOS

Uber Engineering

From WWDC reveal to delivery, discover how we tackled new tech, design challenges, and tight timelines to enhance rider & driver experiences with Live Activity® from Apple.

article thumbnail

How To Speak The Language Of Financial Success In Product Management

Speaker: Jamie Bernard

Success in product management goes beyond delivering great features - it’s about achieving measurable financial outcomes that resonate across the organization. By connecting your product’s journey with the company’s financial success, you’ll ensure that every feature, release, and innovation contributes to the bottom line, driving both customer satisfaction and business growth.

article thumbnail

Introducing Mosaic AI Model Training for Fine-Tuning GenAI Models

databricks

Today, we're thrilled to announce that Mosaic AI Model Training's support for fine-tuning GenAI models is now available in Public Preview. At Databricks.

article thumbnail

5 Tools Every Data Scientist Needs in Their Toolbox in 2024

KDnuggets

From the soft tools to the hard tools, these are what make a data scientist successful.

Data 151
article thumbnail

Mainframe History: How Mainframe Computers Have Changed Over the Years

Precisely

Mainframes have one of the longest histories of any kind of computing technology that is still used today. In fact, mainframe history shows the fast-evolving landscape of technology, few innovations have left as profound a mark as mainframe computers. From their inception to the sophisticated systems of today, mainframes have continuously adapted to meet the ever-growing demands of business operations.

article thumbnail

How are Apache Iceberg Tables Optimizing Data Lake Management?

Hevo

A data lake is a central storage place for an organization’s data in its original format. Unlike data warehouses, data lakes can handle all kinds of data, including unstructured and semi-structured data like images, video, audio, and documents.

article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.

article thumbnail

A Framework for Multi-Model Forecasting on Databricks

databricks

Introduction Time series forecasting serves as the foundation for inventory and demand management in most enterprises. Using data from past periods along with.

article thumbnail

How to Use Conditional Formatting in Pandas to Enhance Data Visualization

KDnuggets

Tired of staring at bland dataframes? Discover how conditional formatting in Pandas can transform your data visualization experience!

Data 123
article thumbnail

Oracle Exits AdTech: What It Means for Your Marketing Strategy

Precisely

Oracle’s recent decision to shut down its advertising business marks a significant shift in the ad tech landscape. This move, driven by declining revenues and increasing regulatory pressures, leaves many advertisers seeking alternative solutions to maintain their marketing momentum while leveraging data that adheres to privacy requirements and aligns with consumers’ expectations.

IT 52
article thumbnail

Iceberg Architecture Examples: How Iceberg powers data and ML applications

Hevo

In recent years, Apache Iceberg has seen considerable advancements that highlights its growing importance. Major tech companies like Google, Snowflake, and Databricks have increasingly embraced this table format. This trend, driven by major tech companies, highlights a transformative shift in the data warehousing landscape as Iceberg gains traction.

article thumbnail

Provide Real Value in Your Applications with Data and Analytics

The complexity of financial data, the need for real-time insight, and the demand for user-friendly visualizations can seem daunting when it comes to analytics - but there is an easier way. With Logi Symphony, we aim to turn these challenges into opportunities. Our platform empowers you to seamlessly integrate advanced data analytics, generative AI, data visualization, and pixel-perfect reporting into your applications, transforming raw data into actionable insights.

article thumbnail

Primary Key and Foreign Key constraints are GA and now enable faster queries

databricks

Dataricks is thrilled to announce the General Availability (GA) of Primary Key (PK) and Foreign Key (FK) constraints, starting in Databricks Runtime 15.2.

article thumbnail

How to Use the pivot_table Function for Advanced Data Summarization in Pandas

KDnuggets

Let's learn to use Pandas pivot_table in Python to perform advance data summarization

Python 124
article thumbnail

Marketing Questions phData Can Answer with Data

phData: Data Engineering

Effective marketing is crucial for business growth, yet achieving cost-effective and impactful results from marketing can be challenging for companies of all sizes. Marketing leaders are tasked with driving results and determining the best course of action for their team by asking questions like: How much should we spend on this new campaign? Should we focus on retaining our customers or trying to find new ones?

article thumbnail

Snowflake Universal Search: A Game-Changer for Data Discovery

Hevo

Searching for data manually in Snowflake can be very challenging, time-consuming and sometimes frustrating. Snowflake identifies these problems and has developed Universal Search to change the way we search for data. The universal search, built on a powerful snowflake cortex, is designed to make finding data straightforward.

Data 52
article thumbnail

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Speaker: Maher Hanafi, VP of Engineering at Betterworks & Tony Karrer, CTO at Aggregage

Executive leaders and board members are pushing their teams to adopt Generative AI to gain a competitive edge, save money, and otherwise take advantage of the promise of this new era of artificial intelligence. There's no question that it is challenging to figure out where to focus and how to advance when it’s a new field that is evolving everyday. 💡 This new webinar featuring Maher Hanafi, VP of Engineering at Betterworks, will explore a practical framework to transform Generative AI pr

article thumbnail

Building Industry IoT and M2M Solutions With Databricks for Communications

databricks

The communications industry is experiencing immense change due to rapid technological advancements and evolving market trends. Communications service providers (CSP) build various solutions.

Building 102
article thumbnail

Using Transfer Learning to Boost Model Performance

KDnuggets

Transfer learning can improve model performance by leveraging pre-trained models and adapting them to new, related tasks.

article thumbnail

Optimizing Hospital Operations with Machine Learning in Healthcare: A Data-Driven Approach

Striim

Real-time data and machine learning are revolutionizing how hospitals operate and deliver care. By adopting a data-driven approach to hospital optimization, healthcare professionals’ jobs become more efficient, allowing them to focus more on what truly matters: Patient health. Not to mention, hospital operation optimization reduces costs. Here’s everything you need to know about how hospitals can leverage advancements of machine learning in healthcare to streamline operations and moderniz

article thumbnail

Avro vs Parquet: Which File Format is Right for You?

Hevo

While working with huge amounts of data, Data serialization plays an important role in the performance of the system. Data Serialization converts complex data structures, such as graphs, trees, etc., into a format that can be easily stored or transmitted over the network or across different distributed systems and programming languages.

article thumbnail

The AI Superhero Approach to Product Management

Speaker: Conrado Morlan

In this engaging and witty talk, industry expert Conrado Morlan will explore how artificial intelligence can transform the daily tasks of product managers into streamlined, efficient processes. Using the lens of a superhero narrative, he’ll uncover how AI can be the ultimate sidekick, aiding in data management and reporting, enhancing productivity, and boosting innovation.