Sat.Mar 23, 2024 - Fri.Mar 29, 2024

article thumbnail

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they ena

Data Lake 162
article thumbnail

A Collection Of Free Data Science Courses From Harvard, Stanford, MIT, Cornell, and Berkeley

KDnuggets

Learn everything about data science by exploring our curated collection of free courses from top universities, covering essential topics from math and programming to machine learning, and mastering the nine steps to become a job-ready data scientist.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Announcing DBRX: A new standard for efficient open source LLMs

databricks

Databricks’ mission is to deliver data intelligence to every enterprise by allowing organizations to understand and use their unique data to build their.

Building 145
article thumbnail

Snowflake Invests in Observe to Expand Observability in the Data Cloud

Snowflake

As organizations seek to drive more value from their data, observability plays a vital role in ensuring the performance, security and reliability of applications and pipelines while helping to reduce costs. At Snowflake, we aim to provide developers and engineers with the best possible observability experience to monitor and manage their Snowflake environment.

Cloud 135
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Schema tracking in Delta Lake

Waitingforcode

Streaming Delta tables is slightly different from streaming native streaming sources, such as Apache Kafka topics. One of the significant differences is schema enforcement. It leads to the job failure in case of schema changes of the streamed table.

Kafka 130
article thumbnail

10 GitHub Repositories to Master MLOps

KDnuggets

Begin your MLOps journey with these comprehensive free resources available on GitHub.

156
156

More Trending

article thumbnail

The New Gold Standard: Introducing the Robinhood Gold Card

Robinhood

Robinhood set to host first-ever keynote to announce the Robinhood Gold Card, a new 1% boost on Robinhood Gold deposits, and a reimagined Robinhood app Today, we are hosting Robinhood Presents: The New Gold Standard, our first-ever keynote event where Co-Founder and CEO Vlad Tenev will unveil new product and feature updates live to Robinhood customers in New York City.

Banking 126
article thumbnail

Predict Known Categorical Outcomes with Snowflake Cortex ML Classification, Now in Public Preview 

Snowflake

Today, enterprises are focused on enhancing decision-making with the power of AI and machine learning (ML). But the complexity of ML models and data science techniques often leaves behind organizations without data scientists or with limited data science resources. And for those organizations with strong data analyst resources, complex ML models and frameworks may seem overwhelming, potentially preventing them from driving faster, higher-quality insights.

article thumbnail

5 Free Google Courses to Become a Software Engineer

KDnuggets

Want to become a software engineer? Make it happen with these free courses and guides from Google.

article thumbnail

Delivering the Next Generation of Consumer Experiences: Databricks and Adobe Announce Strategic Partnership

databricks

By Steve Sobel - Global Industry Leader; Communications, Media & Entertainment Today Databricks and Adobe are excited to announce a strategic partnership focused.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Moderating Inappropriate Video Content at Yelp

Yelp Engineering

One of Yelp’s top priorities is the trust and safety of our users. Yelp’s platform is most well-known for its reviews, and its moderation practices have been recognised in academic research for mitigating misinformation and building consumer trust. In addition to reviews, Yelp’s Trust and Safety team takes significant measures when it comes to protecting its users from inappropriate material posted through other content types.

Building 115
article thumbnail

Snowflake Data Clean Rooms: Securely Collaborate to Unlock Insights and Value

Snowflake

In December 2023, Snowflake announced its acquisition of data clean room technology provider Samooha. Samooha’s intuitive UI and focus on reducing the complexity of sharing data led to it being named one of the most innovative data science companies of 2024 by Fast Company. Now, Samooha’s offering is integrated into Snowflake and launched as Snowflake Data Clean Rooms , a Snowflake Native App on Snowflake Marketplace, generally available to customers in AWS East, AWS West and Azure West.

Media 109
article thumbnail

The Promise of Edge AI and Approaches for Effective Adoption

KDnuggets

Organizations are adopting edge AI for real-time decision-making using efficient and cost-effective methods such as model quantization, multimodal databases, and distributed inferencing.

Database 151
article thumbnail

Announcing the State Reader API: The New "Statestore" Data Source

databricks

Databricks Runtime 14.3 includes a new capability that allows users to access and analyze Structured Streaming 's internal state data: the State Reader.

Data 133
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Top UI UX Trends to Know in 2024

Knowledge Hut

The process of developing digital assets that are both aesthetically pleasing and simple to use is known as user interface/user experience design, or UI/UX design. While UX designers concentrate on the user's journey and how they engage with the product, UI designers are more concerned with the appearance and feel of a product. Because of digital innovation and the dynamic needs of consumers, the field of UI/UX design is always developing.

Designing 105
article thumbnail

Phone Number Masking for Yelp Services Projects

Yelp Engineering

In this blog post, we highlight how phone number masking helps build consumer trust in the services marketplace at Yelp, decreases the friction in communication with service professionals, and allows for seamless switching between the Yelp app and a user’s phone. We present a high level overview of our in-house phone masking system and dive into the details of the engineering challenge of optimizing the usage of proxy phone number resources at Yelp’s scale.

Project 103
article thumbnail

7 Steps to Mastering Large Language Model Fine-tuning

KDnuggets

From theory to practice, learn how to enhance your NLP projects with these 7 simple steps.

Project 149
article thumbnail

Announcing the General Availability of Databricks Notebooks on SQL Warehouses

databricks

Today, we are excited to announce the general availability of Databricks Notebooks on SQL warehouses. Databricks SQL warehouses are SQL-optimized compute that provide.

SQL 119
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Setting Up Kafka Multi-Tenancy 

DoorDash Engineering

Real-time event processing is a critical component of a distributed system’s scalability. At DoorDash, we rely on message queue systems based on Kafka to handle billions of real-time events. One of the challenges we face, however, is how to properly validate the system before going live. Traditionally, an isolated environment such as staging is used to validate new features.

Kafka 103
article thumbnail

How Advertising, Media & Entertainment and Manufacturing Companies Are Accelerating Data, Apps and AI Strategy in the Data Cloud

Snowflake

In 2023, we held our first Accelerate event to explore industry trends, track data and technology innovations in financial services, and lay out data strategy case studies for the industry. This year, we are expanding to five industry events featuring leaders sharing insights relevant to advertising, media and entertainment; manufacturing; healthcare and life sciences; financial services; and retail and consumer goods.

article thumbnail

Become a Business Intelligence Analyst in Less Than 6 Months

KDnuggets

Ready to become a business intelligence analyst right here, right now?

article thumbnail

Managed Sportlogiq to Databricks Data Ingestion Pipelines for NHL Teams: A Game-Changing Alliance

databricks

Overview In the competitive world of professional hockey, NHL teams are always seeking to optimize their performance. Advanced analytics has become increasingly important.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Building Databricks Data Pipelines 101

Confessions of a Data Guy

Have you ever wondered at a high level what it’s like to build production-level data pipelines on Databricks? What does it look like, what tools do you use? The post Building Databricks Data Pipelines 101 appeared first on Confessions of a Data Guy.

article thumbnail

Bringing HDR photo support to Instagram and Threads

Engineering at Meta

Meta’s family of apps serves trillions of image download requests every day. And if you’re into high-quality images, you’ve probably noticed that Instagram and Threads have added support for high dynamic range (HDR) photos. Now people on Threads and Instagram can upload and share images that are more true-to-life, with the full color and range their device is capable of capturing.

Media 100
article thumbnail

Mastering Python for Data Science: Beyond the Basics

KDnuggets

This article serves as a detailed guide on how to master advanced Python techniques for data science. It covers topics such as efficient data manipulation with Pandas, parallel processing with Python, and how to turn models into web services.

article thumbnail

PySpark in 2023: A Year in Review

databricks

With the releases of Apache Spark 3.4 and 3.5 in 2023, we focused heavily on improving PySpark performance, flexibility, and ease of use.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How To Build and Open Source PYPI Python Package

Confessions of a Data Guy

Ever wondered how to build and end-to-end project for an Open Source Python Package that gets published to PYPI? I built out lakescuman open-source package to help with Databricks Unity Catalog Delta Lake tables querying with Polars, DuckDB, or PyArrow. [link] The post How To Build and Open Source PYPI Python Package appeared first on Confessions of a Data Guy.

Python 100
article thumbnail

How I use Gen AI as a Data Engineer

Towards Data Science

Generative AI is all the rage.

article thumbnail

The Art of Effective Prompt Engineering with Free Courses and Certifications

KDnuggets

Have you ever asked yourself ‘Am I using these generative AI tools correctly?

article thumbnail

Deloitte Data as a Service for Banking: A Modern Data Solution for Banks and Capital Markets Institutions

databricks

As new Generative AI capabilities continue to emerge with heightened customer expectations, data modernization and migration to the cloud have become critical success.

Banking 98
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m