Sat.Nov 02, 2024 - Fri.Nov 08, 2024

article thumbnail

The Race For Data Quality in a Medallion Architecture

DataKitchen

The Race For Data Quality In A Medallion Architecture The Medallion architecture pattern is gaining traction among data teams. It is a layered approach to managing and transforming data. The Medallion architecture is a design pattern that helps data teams organize data processing and storage into three distinct layers, often called Bronze, Silver, and Gold.

article thumbnail

Best No-Code LLM App Builders

KDnuggets

Build an LLM application by easily picking and dropping components and connecting them, such as a vector store, web search, memory, and custom prompt.

Coding 152
article thumbnail

Announcing the General Availability of Materialized Views and Streaming Tables for Databricks SQL

databricks

We’re excited to announce that materialized views (MVs) and streaming tables (STs) are now Generally Available in Databricks SQL on AWS and Azure.

SQL 132
article thumbnail

What Is AWS DMS And Why You Shouldn’t Use It As An ELT

Seattle Data Guy

Recently, I’ve encountered a few projects that used AWS DMS, which is almost like an ELT solution. Whether it was moving data from a local database instance to S3 or some other data storage layer. It was interesting to see AWS DMS used in this manner. But it’s not what DMS was built for. As… Read more The post What Is AWS DMS And Why You Shouldn’t Use It As An ELT appeared first on Seattle Data Guy.

AWS 130
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

What’s new in ArcGIS Data Interoperability at Pro 3.4

ArcGIS

An overview of all the enhancements and improves with ArcGIS Data Interoperability with the latest release of ArcGIS Pro at version 3.4.

Data 105
article thumbnail

Roadmap for Becoming a Data Scientist

KDnuggets

From learning Python to creating analytical reports, learn about ten easy steps to become a data scientist.

Python 144

More Trending

article thumbnail

Gen AI in Action: Customers’ Cortex AI Stories and Outcomes

Snowflake

For years, companies have operated under the prevailing notion that AI is reserved only for the corporate giants — the ones with the resources to make it work for them. But as technology speeds forward, organizations of all sizes are realizing that generative AI isn’t just aspirational: It’s accessible and applicable now. With Snowflake’s easy-to-use, unified AI and data platform, businesses are removing the manual drudgery, bottlenecks and error-prone labor that stymie productivity, and are usi

article thumbnail

BI-as-Code and the New Era of GenBI

Simon Späti

BI-as-Code and the New Era of GenBI Imagine creating business dashboards by simply describing what you want to see. No more clicking through complex interfaces or writing SQL queries - just have a conversation with AI about your data needs. This is the promise of Generative Business Intelligence (GenBI). At its core, GenBI delivers an unreasonably effective human interface , where we iterate quickly, based on BI-as-Code.

BI 130
article thumbnail

5 No-Cost Learning Resources for LLM Agents

KDnuggets

Curious about LLM agents? Here’s a list of free courses, guides, and blogs that make it easy to start learning and stay updated.

IT 142
article thumbnail

What’s New in AI/BI Dashboards - Fall ‘24

databricks

Introduction Databricks AI/BI Dashboards have made significant strides since we announced their General Availability. Built on Databricks SQL and powered by Data Intelligence.

BI 92
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Unlocking Faster Insights: How Cloudera and Cohere can deliver Smarter Document Analysis

Cloudera

Today we are excited to announce the release of a new Cloudera Accelerator for Machine Learning (ML) Projects (AMP) for PDF document analysis, “ Document Analysis with Command R and FAISS ”, leveraging Cohere’s Command R Large Language Model (LLM), the Cohere Toolkit for retrieval augmented generation (RAG) applications, and Facebook’s AI Similarity Search (FAISS).

article thumbnail

Calling All Builders: Get Hands-On With AI and Apps

Snowflake

You’ve heard about Snowflake’s new capabilities, our fresh products and innovations that help bring AI and apps to life. Now, it’s time to BUILD. Join us for BUILD 2024, a three-day global virtual conference taking place Nov. 12-15, to hear major Snowflake product announcements firsthand and to learn how to build with our latest innovations through dozens of technical sessions and hands-on labs.

article thumbnail

Navigating AI Regulation: Balancing Innovation and Protection

KDnuggets

In this article, we will learn how to navigate the fine balance building AI regulation while simultaneously fostering innovation.

Building 141
article thumbnail

What's new with Databricks SQL, October 2024

databricks

We are excited to share the latest features and performance improvements that make Databricks SQL simpler, faster, and more affordable than ever. Databricks.

SQL 90
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Adopting Spark Connect

Towards Data Science

How we use a shared Spark server to make our Spark infrastructure more efficient Image by Kanenori from Pixabay Spark Connect is a relatively new component in the Spark ecosystem that allows thin clients to run Spark applications on a remote Spark cluster. This technology can offer some benefits to Spark applications that use the DataFrame API. Spark has long allowed to run SQL queries on a remote Thrift JDBC server.

Scala 69
article thumbnail

Introducing Apache Kafka® 3.9

Confluent

Apache Kafka 3.9 includes multiple KIPs covering Kafka Core, Connect, and Streams—adding dynamic KRaft quorums, better ZK migration, Tiered Storage improvements & more.

Kafka 69
article thumbnail

Optimizing RAG with Embedding Tuning

KDnuggets

Learn how to improve the performance of RAG systems, and make them more accurate at retrieving context-aware information.

Systems 141
article thumbnail

Data Engineering Weekly #196

Data Engineering Weekly

Foundation Capital: A System of Agents brings Service-as-Software to life software is no longer simply a tool for organizing work; software becomes the worker itself, capable of understanding, executing, and improving upon traditionally human-delivered services. The author narrates that multiple agents working together achieve better results than one.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The “Gold-Rush Paradox” in Data: Why Your KPIs Need a Rethink

Towards Data Science

You’re not doing as good a job as you think you are Continue reading on Towards Data Science »

article thumbnail

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

ThoughtSpot

ThoughtSpot prioritizes the high availability and minimal downtime of our systems to ensure a seamless user experience. In the realm of modern analytics platforms, where rapid and efficient processing of large datasets is essential, swift metadata access and management are critical for optimal system performance. Any delays in metadata retrieval can negatively impact user experience, resulting in decreased productivity and satisfaction.

article thumbnail

Mastering f-strings in Python

KDnuggets

Discover how to leverage Python's f-strings (formatted string literals) to write cleaner, more efficient, and more readable code.

Python 139
article thumbnail

Discover the Future of Data Streaming with Confluent at AWS re:Invent 2024

Confluent

Join Confluent at AWS re:Invent 2024 to learn how to stream, connect, process, and govern data, unlocking its full potential. Visit our booth for demos, sessions, and more.

AWS 59
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

What I Learned from Teaching Tech for the Past 2 Years

Towards Data Science

Tips and tricks for teachers and mentors about teaching tech Continue reading on Towards Data Science »

article thumbnail

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Though basic and easy to use, traditional table storage formats struggle to keep up. Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms.

article thumbnail

7 Python Projects to Boost Your Data Science Portfolio

KDnuggets

Enhance your data science portfolio with these seven engaging Python projects that demonstrate essential programming and software engineering skills.

Portfolio 139
article thumbnail

2025 Planning Insights: Data Quality Remains the Top Data Integrity Challenge and Priority

Precisely

Key Takeaways: Data quality is the top challenge impacting data integrity – cited as such by 64% of organizations. Data trust is impacted by data quality issues, with 67% of organizations saying they don’t completely trust their data used for decision-making. Data quality is the top data integrity priority in 2024, cited by 60% of respondents. The 2025 Outlook: Data Integrity Trends and Insights report is here!

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Operational and Analytical Data

Towards Data Science

What is the difference and how should we treat data in the enterprise?

article thumbnail

Splitting Large CSV Files in Snowflake Using Snowpark

Cloudyard

Read Time: 2 Minute, 31 Second In data engineering, we often encounter large files that need to be processed in chunks. Using Snowflake’s Snowpark, you can split a large CSV file into smaller parts and handle each as needed. However, while Snowpark provides powerful in-database processing capabilities, splitting files this way may not be the most efficient method in production environments.

AWS 52
article thumbnail

Language Models Explained in 5 Minutes

KDnuggets

Familiarize yourself with the technology behind ChatGPT and Google Gemini in the time it takes to enjoy a cup of coffee.

article thumbnail

Ransomware Attacks: 3 Keys to Resilience for Your IBM i Systems

Precisely

Key Takeaways: In the face of ransomware attacks, a resilience strategy for IBM i systems must include measures for prevention, detection, and recovery. Built-in security features and enterprise-wide security operations help create a robust defense against ransomware. AI-driven tools are emerging to help you combat these attacks more efficiently and effectively.

Systems 59
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.