Sat.Jan 13, 2024 - Fri.Jan 19, 2024

article thumbnail

Data Engineers: We Need To Talk About Alert Fatigue

Monte Carlo

5 factors that lead to alert fatigue and how to prevent them with incident management best practices Last Friday afternoon, Pedram Navid, head of data at Dagster and overall data influencer , went to X to ask an important question. He asked: Ok — is anomaly detection in data actually that useful or is just a bunch of alerts you end up muting and not doing anything with?

article thumbnail

5 Free University Courses to Learn Data Science

KDnuggets

Looking to make a career in data science? Here are five free university courses to help you get started.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Databricks SQL Year in Review (Part I): AI-optimized Performance and Serverless Compute

databricks

This is part 1 of a blog series where we look back at the major areas of progress for Databricks SQL in 2023.

SQL 138
article thumbnail

A look under GHC's hood: desugaring linear types

Tweag

I recently merged linear let- and where-bindings in GHC. Which means that we’ll have these in GHC 9.10, which is cause for celebration for me. Though they are much overdue, so maybe I should instead apologise to you. Anyway, I thought I’d take the opportunity to discuss some of GHC’s inner workings and how they explain some of the features of linear types in Haskell.

Algorithm 136
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Table file formats - streaming reader: Delta Lake

Waitingforcode

Even though I'm into streaming these days, I haven't really covered streaming in Delta Lake yet. I only slightly blogged about Change Data Feed but completely missed the fundamentals. Hopefully, this and next blog posts will change this!

Data 130
article thumbnail

Read This Before Making a Career Switch to Data Science

KDnuggets

From Skill Assessment to Networking: Your Roadmap to Thriving in the World of Data Science.

More Trending

article thumbnail

Cartographic conventions

ArcGIS

What are cartographic conventions and do you need to follow them?

Education 129
article thumbnail

Simplify Data Integration With Informatica’s Snowflake Native App

Snowflake

Leading companies around the world rely on Informatica data management solutions to manage and integrate data across various platforms from virtually any data source and on any cloud. Now, Informatica customers in the Snowflake ecosystem have an even easier way to integrate data to and from the Snowflake Data Cloud. Informatica’s Enterprise Data Integrator, a Snowflake Native App currently in public preview, facilitates the high-speed replication of enterprise data into Snowflake and brings the

article thumbnail

5 Ways of Converting Unstructured Data into Structured Insights with LLMs

KDnuggets

From Chaos to Clarity: Understanding the Unstructured Data Dilemma.

article thumbnail

Validation vs. Verification: What’s the Difference?

Precisely

Data validation Data verification Purpose Check whether data falls within the acceptable range of values Check data to ensure it’s accurate and consistent Usually performed When data is created or updated When data is migrated or merged Example Checking whether user-entered ZIP code can be found Checking that all ZIP codes in dataset are in ZIP+4 format To a layperson, data verification and data validation may sound like the same thing.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Monitoring Cloudera DataFlow Deployments With Prometheus and Grafana

Cloudera

Cloudera DataFlow for the Public Cloud (CDF-PC) is a complete self-service streaming data capture and movement platform based on Apache NiFi. It allows developers to interactively design data flows in a drag and drop designer, which can be deployed as continuously running, auto-scaling flow deployments or event-driven serverless functions. CDF-PC comes with a monitoring dashboard out of the box for data flow health and performance monitoring.

Bytes 106
article thumbnail

New Snowflake Features Released in December 2023

Snowflake

In the final month of 2023, Snowflake released features around Snowflake Cortex functions, Snowpark ML, cost management and more. Read on to learn more about everything we announced in December. Snowpark Enhancements GPU-powered compute with Snowpark Container Services – public preview in select AWS regions Snowpark Container Services is a fully managed container offering that helps you deploy, manage and scale containerized code, whether it’s a large language model (LLM) or a full-stack a

Python 119
article thumbnail

Enroll in a Data Science Undergraduate Program For Free

KDnuggets

Path to a Free Self-Taught Education in Data Science for Everyone.

article thumbnail

In the spotlight with Adil Kamalsha, ThoughtSpot’s Selfless Excellence champion

ThoughtSpot

This is part of our ongoing spotlight series which highlights ThougthSpot’s quarterly Selfless Excellence champion. At ThoughtSpot, Selfless Excellence is the heart of who we are as a company. It creates room for personal success – but never at the cost of others on the team. Simply put, this means we consider our teammates, customers, and society at large ahead of our own personal wins, and without the distraction of office politics.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Evolving Your SIEM Detection Rules: A Journey from Simple to Sophisticated

databricks

Cyber threats and the tools to combat them have become more sophisticated. SIEM is over 20 years old and has evolved significantly in.

101
101
article thumbnail

Top 4 Data + AI Predictions for Telecommunications in 2024

Snowflake

The sheer breadth of data that telecommunications providers collect day-to-day is a huge advantage for the industry. Yet, many providers have been slower to adapt to a data-driven, hyperconnected world even as their services — including streaming, mobile payments and applications such as video conferencing — have driven innovation in nearly every other industry.

article thumbnail

Breaking Down Quantum Computing: Implications for Data Science and AI

KDnuggets

This article has explored the impact of quantum computing on data science and AI. We will look at the fundamental concepts of quantum computing and the key terms that are used in the field. We will also cover the challenges that lie ahead for quantum computing and how they can be overcome.

article thumbnail

Are you a data power user? 3 reasons to join a ThoughtSpot User Group

ThoughtSpot

Are you a ThoughtSpot enthusiast? Maybe you built a liveboard that saved your department hours each work week, or perhaps you figured out a unique way to gamify adoption across your team. You put in the hard work, now it’s time to show it off. ThoughtSpot User Groups were designed to help users connect—a place where you can share stories and get new ideas to empower your organization with data.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Engineering Lessons Learned from LLM Fine Tuning

Confessions of a Data Guy

Well, I finally got around to it. What you say? Fine-tuning an LLM, that’s what. I mean all the cool kids are talking about and caring on like it’s the next thing. What can I say … I’m jaded. I’ve been working on ML systems for a good few years now, and I’ve seen the […] The post Engineering Lessons Learned from LLM Fine Tuning appeared first on Confessions of a Data Guy.

article thumbnail

5 Reasons Manufacturers Should Move ERP Data to Snowflake to Supercharge Analytics

Snowflake

Advanced analytics help manufacturers extract insights from their data and improve operations and decision-making. But for manufacturers, it’s often challenging to perform analytics with ERP data. Because of the high rate of M&A activity in the industry, manufacturing enterprises often struggle with multiple ERP instances. A fragmented resource planning system causes data silos, making enterprise-wide visibility virtually impossible.

article thumbnail

SQL Group By and Partition By Scenarios: When and How to Combine Data in Data Science

KDnuggets

Learn the generic scenarios and techniques of grouping and aggregating data, partitioning and ranking data in SQL, which will be very helpful in reporting requirements.

SQL 145
article thumbnail

The Best 10 Programming Languages Every Ethical Hacker Needs to Learn

Knowledge Hut

"Data is the pollution problem of the information age, and protecting privacy is the environmental challenge" — Bruce Schneier. Ethical hacking is the heads-on solution for this challenge — a way to counter attacks from unwanted sources. It judges the security wall of a system and discovers and eliminates inconsistencies. Ethical hacking aims to prevent digital threats and vulnerabilities in the system and is a crucial online asset for security.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Handling Online-Offline Discrepancy in Pinterest Ads Ranking System

Pinterest Engineering

Author: Cathy Qian, Aayush Mudgal, Yinrui Li and Jinfeng Zhuang Image from [link] Introduction At Pinterest, our mission is to bring everyone the inspiration to create a life they love. People often come to Pinterest when they are considering what to do or buy next. Understanding this evolving user journey while balancing across multiple objectives is crucial to bring the best experience to Pinterest users and is supported by multiple recommendation models, with each providing real-time inferenc

Systems 98
article thumbnail

Comparing Performance of Big Data File Formats: A Practical Guide

Towards Data Science

Parquet vs ORC vs Avro vs Delta Lake Photo by Viktor Talashuk on Unsplash The big data world is full of various storage systems, heavily influenced by different file formats. These are key in nearly all data pipelines, allowing for efficient data storage and easier querying and information extraction. They are designed to handle the challenges of big data like size, speed, and structure.

article thumbnail

5 FREE Courses on AI with Microsoft for 2024

KDnuggets

Kickstart your AI journey this new year with 5 FREE learning resources from Microsoft.

143
143
article thumbnail

The Future Scope of Ethical Hacking in 2024 and beyond?

Knowledge Hut

One of the most commonly used terms in the IT sector is ethical hacking. The rising frequency of cyber-attacks has forced businesses and government agencies to tighten their defences against malicious hackers. In the current digital era, ethical hacking has become extremely important. Ethical hacking is an ideal career choice for folks who wish to break into the IT industry by being a Certified Ethical Hacker (CEH).

Banking 98
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Veritas: Delivering Real-World Data through Datavant on Databricks

databricks

This post was written in collaboration with Jason Labonte, Chief Executive Officer, Veritas Data Research In the realm of healthcare and life sciences.

article thumbnail

A Prequel to Data Mesh

Towards Data Science

My personal take on justifying the existence of Data Mesh A senior stakeholder at one my projects mentioned that they wanted to decentralise their data platform architecture and democratise data across the organisation. When I heard the words ‘decentralised data architecture’, I was left utterly confused at first! In my then limited experience as a Data Engineer, I had only come across centralised data architectures and they seemed to be working very well.

article thumbnail

6 Reasons Why a Universal Semantic Layer is Beneficial to Your Data Stack

KDnuggets

Looking to understand the universal semantic layer and how it can improve your data stack? This GigaOm Sonor report on Semantic Layers can help you delve deeper.

Data 137
article thumbnail

Top 10 Data Science Companies in 2024

Knowledge Hut

Data Science is an amalgamation of several disciplines, including computer science, statistics, and machine learning. As the world on the internet is becoming our second home, Big Data has exploded. Data Science is the study of this big data to derive a meaningful pattern. All the businesses are now looking to explore this gold mine of information to solve already existing problems.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m