Sat.Jun 15, 2024 - Fri.Jun 21, 2024

article thumbnail

What I’ve Learned After A Decade Of Data Engineering

Confessions of a Data Guy

After 10 years of Data Engineering work, I think it’s time to hang up the proverbial hat and ride off into the sunset, never to be seen again. I wish. Everything has changed in 10 years, yet nothing has changed in 10 years, how is that even possible? Sometimes I wonder if I’ve learned anything […] The post What I’ve Learned After A Decade Of Data Engineering appeared first on Confessions of a Data Guy.

article thumbnail

5 Free Artificial Intelligence Courses from Top Universities

KDnuggets

Want to learn AI from the best of resources? Check out these free AI courses from top universities.

156
156
article thumbnail

Databricks, Snowflake and the future

Christophe Blefari

Welcome to the snow world ( credits ) Every year, the competition between Snowflake and Databricks intensifies, using their annual conferences as a platform for demonstrating their power. This year, the Snowflake Summit was held in San Francisco from June 2 to 5, while the Databricks Data+AI Summit took place 5 days later, from June 10 to 13, also in San Francisco.

Metadata 147
article thumbnail

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

Summary Stripe is a company that relies on data to power their products and business. To support that functionality they have invested in Trino and Iceberg for their analytical workloads. In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform.

Data Lake 147
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

OpenAI Acquires Rockset

Rockset

I’m excited to share that OpenAI has completed the acquisition of Rockset. We are thrilled to join the OpenAI team and bring our technology and expertise to building safe and beneficial AGI. From the start, our vision at Rockset was to fundamentally transform the way data-driven applications were built. We developed our search and analytics database, taking full advantage of the cloud, to eliminate the complexity inherent in the data infrastructure needed for these apps.

Database 145
article thumbnail

Deploying Machine Learning Models: A Step-by-Step Tutorial

KDnuggets

Image by author Model deployment is the process of trained models being integrated into practical applications. This includes defining the necessary environment, specifying how input data is introduced into the model and the output produced, and the capacity to analyze new data and provide relevant predictions or categorizations.

More Trending

article thumbnail

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Snowflake

Thousands of customers have worked with Snowflake to cost-effectively build a secure data foundation as they look to solve a growing variety of business problems with more data. Increasingly customers are looking to expand that powerful foundation to a broader set of data across their enterprise. Snowflake is now making it even easier for customers to bring the platform’s usability, performance, governance and many workloads to more data with Iceberg tables (now generally available), unlocking f

Data Lake 117
article thumbnail

Cloudera Unveils Plans for Annual Pride Celebration in Cork

Cloudera

Pride Month is underway and we at Cloudera are looking forward to joining the global celebration of diversity, equity and the ongoing effort for LGBTQ+ ( L esbian, G ay, B isexual, T ransgender, Q ueer/ Q uestioning) rights and recognition. Pride Month serves as a reminder that the fight for equality and equity for members of the LGBTQ+ community is not over.

Systems 112
article thumbnail

Creating AI-Driven Solutions: Understanding Large Language Models

KDnuggets

Understanding LLMs is pivotal in unlocking the full potential of AI-driven solutions across various domains. As we navigate the process of building AI-driven solutions, it is essential to approach the development and deployment of LLMs with a focus on responsible AI practices.

Building 151
article thumbnail

Databricks Named a Leader in 2024 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms

databricks

We are excited to announce that Gartner has recognized Databricks as a Leader in the 2024 Gartner® Magic Quadrant™ for Data Science and.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

The Future of Telecoms: Embracing Gen AI as a Strategic Competitive Advantage

Snowflake

The telecom industry is undergoing an unprecedented transformation. Fueled by tech advancements such as 5G, cloud computing, Internet of Things (IoT) and machine learning (ML), telecoms have the opportunity to reshape and streamline operations and make significant improvements in service delivery, customer experience and network optimization. Key to these technologies is generative AI (gen AI), a dynamic form of artificial intelligence that leverages vast amounts of data to analyze and produce r

article thumbnail

Boost your Productivity with Tool Parameter Overrides in ArcGIS Pro 3.3

ArcGIS

Productivity Update! Learn how to override default parameter values for geoprocessing tools in ArcGIS Pro 3.3. Override Geoprocessing Tool Defaults in ArcGIS Pro 3.

111
111
article thumbnail

A Simple to Implement End-to-End Project with HuggingFace

KDnuggets

Generating a ready-to-use HuggingFace model with FastAPI and Docker

Project 149
article thumbnail

Santalucía Seguros: Enterprise-level RAG for Enhanced Customer Service and Agent Productivity

databricks

In the insurance sector, customers demand personalized, fast, and efficient service that addresses their needs. Meanwhile, insurance agents must access a large amount.

Insurance 105
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

How to Turn a REST API Into a Data Stream with Kafka and Flink

Confluent

Improve REST API response data w/Kafka and Flink SQL in Confluent Cloud; Automatic connector retriability combats REST flakiness; Demo w/OpenSky data.

Kafka 105
article thumbnail

Data Engineering Weekly #176

Data Engineering Weekly

Experience Enterprise-Grade Apache Airflow Astro augments Airflow with enterprise-grade features to enhance productivity, meet scalability and availability demands across your data pipelines, and more. Learn More → Databricks: Open Sourcing Unity Catalog This week brought many exciting developments, with Snowflake and Databricks announcing open-source catalogs.

article thumbnail

Beginner’s Guide to Machine Learning Testing With DeepChecks

KDnuggets

Perform data integrity tests and generate model evaluation reports by writing a few lines of code.

article thumbnail

PVF: A novel metric for understanding AI systems’ vulnerability against SDCs in model parameters

Engineering at Meta

We’re introducing parameter vulnerability factor (PVF) , a novel metric for understanding and measuring AI systems’ vulnerability against silent data corruptions (SDCs) in model parameters. PVF can be tailored to different AI models and tasks, adapted to different hardware faults, and even extended to the training phase of AI models. We’re sharing results of our own case studies using PVF to measure the impact of SDCs in model parameters, as well as potential methods of identifying SDCs in model

Systems 99
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Modern Data Engineering: Free Spark to Snowpark Migration Accelerator for Faster, Cheaper Pipelines in Snowflake

Snowflake

In the age of AI, enterprises are increasingly looking to extract value from their data at scale but often find it difficult to establish a scalable data engineering foundation that can process the large amounts of data required to build or improve models. Designed for processing large data sets, Spark has been a popular solution, yet it is one that can be challenging to manage, especially for users who are new to big data processing or distributed systems.

article thumbnail

The Importance of Recognizing Juneteenth

Cloudera

Juneteenth holds profound significance in the history of freedom and equality for Black Americans. Also known as Freedom Day or Emancipation Day, Juneteenth commemorates the anniversary of June 19, 1865, when news of the Emancipation Proclamation reached Galveston, Texas, finally declaring freedom for enslaved Americans held in the Confederacy–more than two years after the proclamation was issued in on January 1, 1863.

article thumbnail

5 Free Templates for Data Science Projects on Jupyter Notebook

KDnuggets

Boost your data science project with these templates.

article thumbnail

A Recap of the Data Engineering Open Forum at Netflix

Netflix Tech

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale. Netflix is not the only place where data engineers are solving challenging problems with creative solutions.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

 It’s Not Just About AI: Does Your Data Strategy Match Your Ambition? 

Snowflake

Recent Snowflake workshops and roundtables have started with the question: “Does your data strategy match your AI ambition?” It certainly sparks customer engagement, but is that the right question to ask? Right now, it seems appropriate with all of the interest — dare I say “hype” — around AI. But it merely reflects the current darling of the tech world, focusing on the technology itself, rather than the ultimate goal.

Food 85
article thumbnail

What’s new for CAD and BIM in ArcGIS Pro 3.3

ArcGIS

Discover what's new in ArcGIS Pro 3.3 for CAD and BIM workflows, allowing you to directly read datasets from Autodesk Revit, Civil 3D, and Industry Foundation Classes.

article thumbnail

A Tour of Python NLP Libraries

KDnuggets

Exploring the available text Python packages for your data workflow.

Python 143
article thumbnail

Empowering Enterprise Generative AI with Flexibility: Navigating the Model Landscape

Cloudera

The world of Generative AI (GenAI) is rapidly evolving, with a wide array of models available for businesses to leverage. These models can be broadly categorized into two types: closed-source (proprietary) and open-source models. Closed-source models, such as OpenAI’s GPT-4o, Anthropic’s Claude 3, or Google’s Gemini 1.5 Pro, are developed and maintained by private and public companies.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Customer Relationship Management vs. Customer Communications Management: Differences and Synergies

Precisely

Key Takeaways: Adopting both CCM and CRM platforms can significantly enhance your customer experience through personalized communications, automated workflows, and consistent messaging across channels. Automating repetitive communication tasks cuts back on manual efforts to save you time and reduce costs. As consumer expectations for efficient and personalized experiences continue to rise, effectively managing customer relationships and communications is more crucial than ever.

article thumbnail

Protected: What’s new for CAD and BIM in ArcGIS Pro 3.3

ArcGIS

Discover what's new in ArcGIS Pro 3.3 for CAD and BIM workflows, allowing you to directly read datasets from Autodesk Revit, Civil 3D, and Industry Foundation Classes.

article thumbnail

How to Implement Agentic RAG Using LangChain: Part 1

KDnuggets

Learn about enhancing LLMs with real-time information retrieval and intelligent agents.

141
141
article thumbnail

Navigating the Storm: How Data Engineering Teams Can Overcome a Data Quality Crisis

DataKitchen

Navigating the Storm: How Data Engineering Teams Can Overcome a Data Quality Crisis Ah, the data quality crisis. It’s that moment when your carefully crafted data pipelines start spewing out numbers that make as much sense as a cat trying to bark. You know you’re in trouble when the finance team uses your reports as modern art installations rather than decision-making tools.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.