Sat.Mar 23, 2024 - Fri.Mar 29, 2024

article thumbnail

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they ena

Data Lake 162
article thumbnail

Schema tracking in Delta Lake

Waitingforcode

Streaming Delta tables is slightly different from streaming native streaming sources, such as Apache Kafka topics. One of the significant differences is schema enforcement. It leads to the job failure in case of schema changes of the streamed table.

Kafka 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

A Collection Of Free Data Science Courses From Harvard, Stanford, MIT, Cornell, and Berkeley

KDnuggets

Learn everything about data science by exploring our curated collection of free courses from top universities, covering essential topics from math and programming to machine learning, and mastering the nine steps to become a job-ready data scientist.

article thumbnail

Announcing DBRX: A new standard for efficient open source LLMs

databricks

Databricks’ mission is to deliver data intelligence to every enterprise by allowing organizations to understand and use their unique data to build their.

Building 145
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Building Databricks Data Pipelines 101

Confessions of a Data Guy

Have you ever wondered at a high level what it’s like to build production-level data pipelines on Databricks? What does it look like, what tools do you use? The post Building Databricks Data Pipelines 101 appeared first on Confessions of a Data Guy.

article thumbnail

Snowflake Invests in Observe to Expand Observability in the Data Cloud

Snowflake

As organizations seek to drive more value from their data, observability plays a vital role in ensuring the performance, security and reliability of applications and pipelines while helping to reduce costs. At Snowflake, we aim to provide developers and engineers with the best possible observability experience to monitor and manage their Snowflake environment.

Cloud 119

More Trending

article thumbnail

Delivering the Next Generation of Consumer Experiences: Databricks and Adobe Announce Strategic Partnership

databricks

By Steve Sobel - Global Industry Leader; Communications, Media & Entertainment Today Databricks and Adobe are excited to announce a strategic partnership focused.

article thumbnail

How To Build and Open Source PYPI Python Package

Confessions of a Data Guy

Ever wondered how to build and end-to-end project for an Open Source Python Package that gets published to PYPI? I built out lakescuman open-source package to help with Databricks Unity Catalog Delta Lake tables querying with Polars, DuckDB, or PyArrow. [link] The post How To Build and Open Source PYPI Python Package appeared first on Confessions of a Data Guy.

Python 100
article thumbnail

The Promise of Edge AI and Approaches for Effective Adoption

KDnuggets

Organizations are adopting edge AI for real-time decision-making using efficient and cost-effective methods such as model quantization, multimodal databases, and distributed inferencing.

Database 119
article thumbnail

Top UI UX Trends to Know in 2024

Knowledge Hut

The process of developing digital assets that are both aesthetically pleasing and simple to use is known as user interface/user experience design, or UI/UX design. While UX designers concentrate on the user's journey and how they engage with the product, UI designers are more concerned with the appearance and feel of a product. Because of digital innovation and the dynamic needs of consumers, the field of UI/UX design is always developing.

Designing 105
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Announcing the State Reader API: The New "Statestore" Data Source

databricks

Databricks Runtime 14.3 includes a new capability that allows users to access and analyze Structured Streaming 's internal state data: the State Reader.

Data 119
article thumbnail

Phone Number Masking for Yelp Services Projects

Yelp Engineering

In this blog post, we highlight how phone number masking helps build consumer trust in the services marketplace at Yelp, decreases the friction in communication with service professionals, and allows for seamless switching between the Yelp app and a user’s phone. We present a high level overview of our in-house phone masking system and dive into the details of the engineering challenge of optimizing the usage of proxy phone number resources at Yelp’s scale.

Project 103
article thumbnail

Mastering Python for Data Science: Beyond the Basics

KDnuggets

This article serves as a detailed guide on how to master advanced Python techniques for data science. It covers topics such as efficient data manipulation with Pandas, parallel processing with Python, and how to turn models into web services.

article thumbnail

Setting Up Kafka Multi-Tenancy 

DoorDash Engineering

Real-time event processing is a critical component of a distributed system’s scalability. At DoorDash, we rely on message queue systems based on Kafka to handle billions of real-time events. One of the challenges we face, however, is how to properly validate the system before going live. Traditionally, an isolated environment such as staging is used to validate new features.

Kafka 103
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Announcing the General Availability of Databricks Notebooks on SQL Warehouses

databricks

Today, we are excited to announce the general availability of Databricks Notebooks on SQL warehouses. Databricks SQL warehouses are SQL-optimized compute that provide.

SQL 112
article thumbnail

Data Architecture and Strategy in the AI Era

Cloudera

At a time when AI is exploding in popularity and finding its way into nearly every facet of business operations, data has arguably never been more valuable. More recently, that value has been made clear by the emergence of AI-powered technologies like generative AI (GenAI) and the use of Large Language Models (LLMs). But, even with the backdrop of an AI-dominated future, many organizations still find themselves struggling with everything from managing data volumes and complexity to security conc

article thumbnail

5 Free Google Courses to Become a Software Engineer

KDnuggets

Want to become a software engineer? Make it happen with these free courses and guides from Google.

article thumbnail

Bringing HDR photo support to Instagram and Threads

Engineering at Meta

Meta’s family of apps serves trillions of image download requests every day. And if you’re into high-quality images, you’ve probably noticed that Instagram and Threads have added support for high dynamic range (HDR) photos. Now people on Threads and Instagram can upload and share images that are more true-to-life, with the full color and range their device is capable of capturing.

Media 91
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Managed Sportlogiq to Databricks Data Ingestion Pipelines for NHL Teams: A Game-Changing Alliance

databricks

Overview In the competitive world of professional hockey, NHL teams are always seeking to optimize their performance. Advanced analytics has become increasingly important.

article thumbnail

Snowflake Data Clean Rooms: Securely Collaborate to Unlock Insights and Value

Snowflake

In December 2023, Snowflake announced its acquisition of data clean room technology provider Samooha. Samooha’s intuitive UI and focus on reducing the complexity of sharing data led to it being named one of the most innovative data science companies of 2024 by Fast Company. Now, Samooha’s offering is integrated into Snowflake and launched as Snowflake Data Clean Rooms , a Snowflake Native App on Snowflake Marketplace, generally available to customers in AWS East, AWS West and Azure West.

Media 87
article thumbnail

10 GitHub Repositories to Master MLOps

KDnuggets

Begin your MLOps journey with these comprehensive free resources available on GitHub.

136
136
article thumbnail

Don’t Get Left Behind in the AI Race: Your Easy Starting Point is Here

Cloudera

The ongoing progress in Artificial Intelligence is constantly expanding the realms of possibility, revolutionizing industries and societies on a global scale. The release of LLMs surged by 136% in 2023 compared to 2022, and this upward trend is projected to continue in 2024. Today, 44% of organizations are experimenting with generative AI, with 10% having already implemented it in operational settings.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Deloitte Data as a Service for Banking: A Modern Data Solution for Banks and Capital Markets Institutions

databricks

As new Generative AI capabilities continue to emerge with heightened customer expectations, data modernization and migration to the cloud have become critical success.

Banking 90
article thumbnail

How Advertising, Media & Entertainment and Manufacturing Companies Are Accelerating Data, Apps and AI Strategy in the Data Cloud

Snowflake

In 2023, we held our first Accelerate event to explore industry trends, track data and technology innovations in financial services, and lay out data strategy case studies for the industry. This year, we are expanding to five industry events featuring leaders sharing insights relevant to advertising, media and entertainment; manufacturing; healthcare and life sciences; financial services; and retail and consumer goods.

article thumbnail

7 Steps to Mastering Large Language Model Fine-tuning

KDnuggets

From theory to practice, learn how to enhance your NLP projects with these 7 simple steps.

Project 132
article thumbnail

#ClouderaLife Employee Spotlight: Jess Hohn-Cabana

Cloudera

Meet Cloudera’s new Senior Vice President of Global Communications, Jess Hohn-Cabana. In this Employee Spotlight, we’ll get to know more about Jess, her new role, and her recent award win at the 2024 Ragan Top Women in Communications Awards. Get to Know Jess: A Seasoned Leader in Tech Communications and Branding Coming to Cloudera with nearly three decades of experience in tech communications and branding, Jess is a leader and a visionary on all things storytelling.

article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

PySpark in 2023: A Year in Review

databricks

With the releases of Apache Spark 3.4 and 3.5 in 2023, we focused heavily on improving PySpark performance, flexibility, and ease of use.

article thumbnail

Powering Connectivity: Top 3 Takeaways from Mobile World Congress 2024

Snowflake

Late last month, innovators from across the telecommunications spectrum — and all the industries that rely on connectivity to succeed — gathered at Mobile World Congress Barcelona (MWC), the biggest telecom conference of the year. More than 100,000 attendees came together with service providers, device manufacturers and tech companies to discuss and discover how technology advances like generative AI (gen AI) and 5G, along with business imperatives such as powering new revenue streams, are resha

article thumbnail

Pydantic Tutorial: Data Validation in Python Made Simple

KDnuggets

Want to write more robust Python applications? Learn how to use Pydantic, a popular data validation library, to model and validate your data.

article thumbnail

Four Data Engineering Projects That Look Great on your CV

Towards Data Science

Data pipelines that would turn you into a decorated data professional Continue reading on Towards Data Science »

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.