Sat.Nov 02, 2024 - Fri.Nov 08, 2024

article thumbnail

The Race For Data Quality in a Medallion Architecture

DataKitchen

The Race For Data Quality In A Medallion Architecture The Medallion architecture pattern is gaining traction among data teams. It is a layered approach to managing and transforming data. The Medallion architecture is a design pattern that helps data teams organize data processing and storage into three distinct layers, often called Bronze, Silver, and Gold.

article thumbnail

Gen AI in Action: Customers’ Cortex AI Stories and Outcomes

Snowflake

For years, companies have operated under the prevailing notion that AI is reserved only for the corporate giants — the ones with the resources to make it work for them. But as technology speeds forward, organizations of all sizes are realizing that generative AI isn’t just aspirational: It’s accessible and applicable now. With Snowflake’s easy-to-use, unified AI and data platform, businesses are removing the manual drudgery, bottlenecks and error-prone labor that stymie productivity, and are usi

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

What Is AWS DMS And Why You Shouldn’t Use It As An ELT

Seattle Data Guy

Recently, I’ve encountered a few projects that used AWS DMS, which is almost like an ELT solution. Whether it was moving data from a local database instance to S3 or some other data storage layer. It was interesting to see AWS DMS used in this manner. But it’s not what DMS was built for. As… Read more The post What Is AWS DMS And Why You Shouldn’t Use It As An ELT appeared first on Seattle Data Guy.

AWS 130
article thumbnail

BI-as-Code and the New Era of GenBI

Simon Späti

BI-as-Code and the New Era of GenBI Imagine creating business dashboards by simply describing what you want to see. No more clicking through complex interfaces or writing SQL queries - just have a conversation with AI about your data needs. This is the promise of Generative Business Intelligence (GenBI). At its core, GenBI delivers an unreasonably effective human interface , where we iterate quickly, based on BI-as-Code.

BI 130
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Adopting Spark Connect

Towards Data Science

How we use a shared Spark server to make our Spark infrastructure more efficient Image by Kanenori from Pixabay Spark Connect is a relatively new component in the Spark ecosystem that allows thin clients to run Spark applications on a remote Spark cluster. This technology can offer some benefits to Spark applications that use the DataFrame API. Spark has long allowed to run SQL queries on a remote Thrift JDBC server.

Scala 75
article thumbnail

Calling All Builders: Get Hands-On With AI and Apps

Snowflake

You’ve heard about Snowflake’s new capabilities, our fresh products and innovations that help bring AI and apps to life. Now, it’s time to BUILD. Join us for BUILD 2024, a three-day global virtual conference taking place Nov. 12-15, to hear major Snowflake product announcements firsthand and to learn how to build with our latest innovations through dozens of technical sessions and hands-on labs.

More Trending

article thumbnail

Data Engineering Weekly #196

Data Engineering Weekly

Foundation Capital: A System of Agents brings Service-as-Software to life software is no longer simply a tool for organizing work; software becomes the worker itself, capable of understanding, executing, and improving upon traditionally human-delivered services. The author narrates that multiple agents working together achieve better results than one.

article thumbnail

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

ThoughtSpot

ThoughtSpot prioritizes the high availability and minimal downtime of our systems to ensure a seamless user experience. In the realm of modern analytics platforms, where rapid and efficient processing of large datasets is essential, swift metadata access and management are critical for optimal system performance. Any delays in metadata retrieval can negatively impact user experience, resulting in decreased productivity and satisfaction.

article thumbnail

What Are Large Vision Models and How Do They Work?

phData: Data Engineering

Large Vision Models (LVMs) have transformed the field of computer vision, setting new benchmarks in image recognition, image segmentation, and object detection. Historically, convolutional neural networks (CNNs) have dominated computer vision tasks. However, with the introduction of the Transformer architecture—initially successful in Natural Language Processing (NLP)—the landscape has shifted.

article thumbnail

Roadmap for Becoming a Data Scientist

KDnuggets

From learning Python to creating analytical reports, learn about ten easy steps to become a data scientist.

Python 138
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

2025 Planning Insights: Data Quality Remains the Top Data Integrity Challenge and Priority

Precisely

Key Takeaways: Data quality is the top challenge impacting data integrity – cited as such by 64% of organizations. Data trust is impacted by data quality issues, with 67% of organizations saying they don’t completely trust their data used for decision-making. Data quality is the top data integrity priority in 2024, cited by 60% of respondents. The 2025 Outlook: Data Integrity Trends and Insights report is here!

article thumbnail

Announcing the General Availability of Materialized Views and Streaming Tables for Databricks SQL

databricks

We’re excited to announce that materialized views (MVs) and streaming tables (STs) are now Generally Available in Databricks SQL on AWS and Azure.

SQL 132
article thumbnail

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Though basic and easy to use, traditional table storage formats struggle to keep up. Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms.

article thumbnail

5 No-Cost Learning Resources for LLM Agents

KDnuggets

Curious about LLM agents? Here’s a list of free courses, guides, and blogs that make it easy to start learning and stay updated.

IT 136
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Ransomware Attacks: 3 Keys to Resilience for Your IBM i Systems

Precisely

Key Takeaways: In the face of ransomware attacks, a resilience strategy for IBM i systems must include measures for prevention, detection, and recovery. Built-in security features and enterprise-wide security operations help create a robust defense against ransomware. AI-driven tools are emerging to help you combat these attacks more efficiently and effectively.

Systems 59
article thumbnail

Splitting Large CSV Files in Snowflake Using Snowpark

Cloudyard

Read Time: 2 Minute, 31 Second In data engineering, we often encounter large files that need to be processed in chunks. Using Snowflake’s Snowpark, you can split a large CSV file into smaller parts and handle each as needed. However, while Snowpark provides powerful in-database processing capabilities, splitting files this way may not be the most efficient method in production environments.

AWS 52
article thumbnail

Loading data into Redshift with DBT

Yelp Engineering

At Yelp, we embrace innovation and thrive on exploring new possibilities. With our consumers’ ever growing appetite for data, we recently revisited how we could load data into Redshift more efficiently. In this blog post, we explore how DBT can be used seamlessly with Redshift Spectrum to read data from Data Lake into Redshift to significantly reduce runtime, resolve data quality issues, and improve developer productivity.

article thumbnail

Optimizing RAG with Embedding Tuning

KDnuggets

Learn how to improve the performance of RAG systems, and make them more accurate at retrieving context-aware information.

Systems 135
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

What’s new in ArcGIS Data Interoperability at Pro 3.4

ArcGIS

An overview of all the enhancements and improves with ArcGIS Data Interoperability with the latest release of ArcGIS Pro at version 3.4.

Data 104
article thumbnail

Season's Speedings: Databricks SQL Delivers 4x Performance Boost Over Two Years

databricks

As the season of giving approaches, we at Databricks have been making our list and checking it twice--but instead of toys and treats.

SQL 101
article thumbnail

The “Gold-Rush Paradox” in Data: Why Your KPIs Need a Rethink

Towards Data Science

You’re not doing as good a job as you think you are Continue reading on Towards Data Science »

article thumbnail

Navigating AI Regulation: Balancing Innovation and Protection

KDnuggets

In this article, we will learn how to navigate the fine balance building AI regulation while simultaneously fostering innovation.

Building 134
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Introducing Apache Kafka® 3.9

Confluent

Apache Kafka 3.9 includes multiple KIPs covering Kafka Core, Connect, and Streams—adding dynamic KRaft quorums, better ZK migration, Tiered Storage improvements & more.

Kafka 69
article thumbnail

What’s New in AI/BI Dashboards - Fall ‘24

databricks

Introduction Databricks AI/BI Dashboards have made significant strides since we announced their General Availability. Built on Databricks SQL and powered by Data Intelligence.

BI 92
article thumbnail

Introducing the New Anthropic Token Counting API

Towards Data Science

Keep a closer eye on your costs when using Claude Continue reading on Towards Data Science »

article thumbnail

Mastering f-strings in Python

KDnuggets

Discover how to leverage Python's f-strings (formatted string literals) to write cleaner, more efficient, and more readable code.

Python 132
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

article thumbnail

Discover the Future of Data Streaming with Confluent at AWS re:Invent 2024

Confluent

Join Confluent at AWS re:Invent 2024 to learn how to stream, connect, process, and govern data, unlocking its full potential. Visit our booth for demos, sessions, and more.

AWS 59
article thumbnail

What's new with Databricks SQL, October 2024

databricks

We are excited to share the latest features and performance improvements that make Databricks SQL simpler, faster, and more affordable than ever. Databricks.

SQL 90
article thumbnail

Operational and Analytical Data

Towards Data Science

What is the difference and how should we treat data in the enterprise?

article thumbnail

7 Python Projects to Boost Your Data Science Portfolio

KDnuggets

Enhance your data science portfolio with these seven engaging Python projects that demonstrate essential programming and software engineering skills.

Portfolio 132
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.