This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Scoop issue. To get full issues twice a week, subscribe here.
Introduction In today’s data-driven world, organizations across industries are dealing with massive volumes of data, complex pipelines, and the need for efficient data processing. Traditional data engineering solutions, such as Apache Airflow, have played an important role in orchestrating and controlling data operations in order to tackle these difficulties.
Co-authors: Shivansh Mundra , Gonzalo Aniano Porcile , Smit Marvaniya , Hany Farid A core part of what we do on the Trust Data Team at LinkedIn is create, deploy, and maintain models that detect and prevent many types of abuse. This spans the detection and prevention of fake accounts, account takeovers, and policy-violating content. We are constantly working to improve and increase the effectiveness of our anti-abuse defenses to protect the experiences of our members and customers.
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
Shuffle is a permanent point in the What's new in Apache Spark series. Why? It's often one the most time consuming part of the jobs and knowing the improvement simply helps writing better pipelines.
Summary Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design.
Sometimes I think Data Engineering is the same as it was 10+ years ago when I started doing it, and sometimes I think everything has changed. It’s probably both. In some ways, the underlying concepts have not moved an inch, some certain truths and axioms still rule over us all like some distant landlord, requiring […] The post Old Dog Learn New Tricks?
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Avneesh Saluja , Andy Yao , Hossein Taghavi Introduction When watching a movie or an episode of a TV show, we experience a cohesive narrative that unfolds before us, often without giving much thought to the underlying structure that makes it all possible. However, movies and episodes are not atomic units, but rather composed of smaller elements such as frames, shots, scenes, sequences, and acts.
During Beyond 2023, we officially launched ThoughtSpot Sage , our new search experience that combines the power of GPT’s natural language processing and generative AI capabilities with the accuracy and security of our patented self-service analytics platform. As the experience layer of the modern data stack and an Elite Partner of Snowflake, we are constantly thinking about how our product innovation unlocks value for our shared customers.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
This is a collaborative post from Databricks and Amazon Web Services (AWS). We thank Venkat Viswanathan, Data and Analytics Strategy Leader, Partner Solutions.
The need to balance data safety with new data initiatives, deliver business value, and change company culture around data tops this year's list of data and analytics management challenges. To help bring new resources and innovation to light, each year, Database Trends and Applications magazine presents the DBTA 100, a list of forward-thinking companies seeking to expand what's possible with data for their customers.
In this article, we’ll learn to adapt pre-trained models to custom classification tasks using a technique called transfer learning. We will demonstrate it for an image classification task using PyTorch, and compare transfer learning on 3 pre-trained models, Vgg16, ResNet50, and ResNet152.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Robinhood Markets, Inc. (“Robinhood”) has entered into an agreement to acquire San Francisco-based X1 Inc. (“X1”), a platform that offers a no-fee credit card with rewards on each purchase. This marks an important step in our journey towards broadening our product offerings and deepening our relationship with existing customers. Providing people with access to a no-fee credit card aligns with our mission to democratize finance for all.
The annual Data Team Awards showcase how different enterprise data teams are delivering solutions to some of the world’s toughest problems. Nearly 300 n.
We are excited to release Cadence 1.0! Used by many major companies, at Uber it powers over 1,000 services with 100K+ updates a second. Learn how Cadence makes it easy to build complex distributed systems.
This article elaborates on the importance of Explainable AI (XAI), what the challenges in building interpretable AI models are, and some practical guidelines for companies to build XAI models.
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
In a world where superheroes captivate our imaginations, it's sometimes hard to recognize the real-life superheroes among us like intelligence analysts, forensic scientists, and cybersecurity professionals. Yes, cybersecurity professionals! Though we may not wear capes or possess extraordinary powers, our role, especially here at LinkedIn, is crucial in safeguarding our members, customers, and employees from the ever-present threat of cyberattacks.
Orca is a 13B parameter model that learns to imitate the reasoning processes of LFMs. It uses progressive learning and teacher assistance from ChatGPT to overcome capacity gaps. By leveraging rich signals from GPT-4, Orca enhances its capabilities and improves imitation learning performance.
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
The annual Data Team Awards showcase how different enterprise data teams are delivering solutions to some of the world’s toughest problems. Nearly 300 n.
At Robinhood, we recognize the vital role that our ESG program plays in supporting our mission to democratize finance for all. To highlight this work, we issued our third annual ESG Report, which outlines how we continue to embed ESG principles into our everyday business operations to advance our mission and drive positive impact for Robinhood. This year’s report comes on the heels of a tough market environment, volatility in crypto markets, rising interest rates in the U.S, and customers battli
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content