This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Learning about how to data models from basic star schemas on the internet is like learning data science using the IRIS data set. It works great as a toy example. But it doesn’t match real life at all. Data modeling in real life requires you fully understand the data sources and your business use cases.… Read more The post Why Is Data Modeling So Challenging – How To Data Model For Analytics appeared first on Seattle Data Guy.
👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Pulse issue. To get full issues twice a week, subscribe here.
Do you think I’m just trying to get you to click? Maybe. Maybe not. After working in and around Data Teams for well over a decade, with both the smartest people to touch the keyboard, and the others, it’s become quite clear to me what the number one skill that identifies a Senior level Engineering […] The post Senior Engineer – The Number One Skill appeared first on Confessions of a Data Guy.
There are probably not that many people working today on the flat files with Structured Streaming than 5 years ago thanks to the table file formats. However, if you are in this group and are still generating CSVs or JSONs with the streaming sink, brace yourself, the memory problems are coming if you don't take action!
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
Summary As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Pulse issue. To get full issues twice a week, subscribe here.
Editor’s Note : So much has happened since we first published this post and created the data observability category and Monte Carlo in 2019. We have updated this post to reflect this rapidly maturing space. You can read the original article linked at the bottom of this page. What is Data observability? The five pillars My data observability definition has not changed since I first coined it in 2019: Data observability refers to an organization’s comprehensive understanding of the health an
In this article, I explore the benefits of specialization for data scientists. Drawing on my own experience as a data scientist, I argue that specializing in a specific area can help you stand out in a crowded job market and provide you with more fulfilling career opportunities.
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Experts from venture capital, Snowflake, and more discuss how generative AI will benefit data teams and the challenges they must solve. Image courtesy of the author. Generated by DiffusionBee. Generative AI is not a new concept. It’s been studied for decades and applied in limited capacities. That is until ChatGPT shocked and awed our collective consciousness in late 2022.
Welcome to Snowflake’s Startup Spotlight, where we learn about awesome companies building businesses on Snowflake. Can accounts receivable be an agent of change? Tesorio Co-Founder and CTO Fabio Fleitas thinks so, and his startup’s AI/ML-driven platform aims to give finance teams better control over their cash flow so they can have greater impact on their organizations’ success.
KDnuggets' new cheat sheet summarizes the top Python libraries for building generative AI apps, from OpenAI and Transformers to tools like Gradio, Diffusers, LangChain, and more. Ideal for both beginners and experts looking for a quick reference.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Why link statically against musl? Have you ever faced compatibility issues when dealing with Linux binary executables? The culprit is often the libc implementation, glibc. Acting as the backbone of nearly all Linux distros, glibc is the library responsible for providing standard C functions. Yet, its version compatibility often poses a challenge. Binaries compiled with a newer version of glibc may not function on systems running an older one, creating a compatibility headache.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Pioneering Data Observability: Data, Code, Infrastructure, & AI The four dimensions of data observability: data, code, infrastructure, and ai? Image courtesy of the author. Outlining the past, present, and future of architecting reliable data systems. When we launched the data observability category in 2019, the term was something I could barely pronounce.
Snowflake recently announced a collaboration with NVIDIA to make it easy to run NVIDIA accelerated computing workloads directly within Snowflake accounts. One interesting use case is to train, customize, and deploy large language models (LLMs) safely and securely within Snowflake. Our new Snowpark Container Services , currently in private preview, together with NVIDIA AI, makes this possible.
Voice assistants like Siri, Alexa and Google Assistant are household names, but they still don't do well in multilingual settings. This article first provides an overview of how voice assistants work, and then dives into the top 5 challenges for voice assistants when it comes to providing a superior multilingual user experience. It also provides strategies for mitigation of these challenges.
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Explore is one of the largest recommendation systems on Instagram. We leverage machine learning to make sure people are always seeing content that is the most interesting and relevant to them. Using more advanced machine learning models, like Two Towers neural networks, we’ve been able to make the Explore recommendation system even more scalable and flexible.
Machine Learning Operations (MLOps) is a relatively new discipline that provides the structure and support necessary for machine learning (ML) models to thrive in production environments.
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Short-lived certificates (SLCs) are part of our latest efforts to further secure our Transport Layer Security (TLS) private keys on our edge networks. SLCs have a very short exposure compared to traditional certificates and lower the chances of a compromised private key being abused. Implementing SLCs has required us to address tradeoffs between operability and reliability, while satisfying the strict security requirements of our edge environment.
Introduction The snapshots feature of the Apache Hadoop Distributed Filesystem ( HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors. This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP).
This article explores StableCode, an innovative AI product by Stability AI, designed to enhance coding efficiency and accessibility. It delves into its unique features, underlying technology, and potential impact on the developer community.
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content