This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
I still remember the good ole days when Apache Spark was fresh and hot, hardly anyone was using it, except a few poor AWS Glue and EMR users … Lord have mercy on their ragged souls. It’s funny how that GOAT of a tool went from being used by a few companies for extremely large […] The post Apache Spark’s Most Annoying Use Case appeared first on Confessions of a Data Guy.
In the spring of 2024, I ran a new survey to gather more data for my Data Teams book and update my 2023 and 2020 surveys. In total, we had 81 respondents. This survey was designed to get information about how management uses data teams, the value they’re creating, and how they’re creating it. The survey asked about the best and worst practices that teams are using or experiencing.
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
At Meta, we’ve been diligently working to incorporate privacy into different systems of our software stack over the past few years. Today, we’re excited to share some cutting-edge technologies that are part of our Privacy Aware Infrastructure (PAI) initiative. These innovations mark a major milestone in our ongoing commitment to honoring user privacy.
The Rebuild Network Topology Add-In provides the ability to rebuild the network topology for the current extent of an active map with ArcGIS Pro 2.9 and 3.1.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
We recently announced the general availability of serverless compute for Notebooks, Workflows, and Delta Live Tables (DLT) pipelines. Today, we'd like to explain.
Another Big Data London is right around the corner, and we couldn’t be more excited. Coming in hot on September 18-19, Big Data London is easily the UK’s biggest data event of the year. And with an event as rare and prestigious as Big Data London, it’s normal to want to maximize your time. That’s why we put together our list of the top things to see and do at Big Data London this year—including the data reliability sessions we’re most excited about and the after-parties you don’t want to miss.
Data engineering is best learned by doing projects. But which ones? Here are six projects focusing on different data engineering skills to ensure you have it all covered.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Skechers has been at the forefront of the e-commerce industry, focusing on hyperpersonalized experiences to meet customer expectations better. Following significant growth during.
The Energy Sector's transformative shift Energy, the driver of the global economy, is undergoing one of the largest secular shifts of our time, propelled by hundreds of trillions of dollars in global investment in the next 25 years. This shift creates a tremendous opportunity for energy companies. And, at the heart of successfully navigating this change sit data and AI.
The Quantum Apocalypse is coming. The advent of quantum computers has raised real questions about the future of data privacy over the internet. Someday, advances in quantum computing will make it possible to decrypt sensitive data that was encrypted using today’s complex cryptography systems. In the latest episode of the Meta Tech Podcast you’ll meet Sheran and Rafael, two engineers leading Meta’s post-quantum readiness work.
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Databricks Workflows now supports single task looping with For Each! Streamline repetitive processes into a single, easy to author, manage, and monitor task.
A Web Developer Roadmap is just like a book of instructions that tells you what you need to learn to become a web developer. It directs the learner’s attention toward mastering only the relevant stuff at any particular time and avoids unnecessary complications and concentration problems. Think about being at the boundary of unfamiliar woodlands where every path is bound for that famous site for web programming.
Welcome to Snowflake’s Startup Spotlight, where we ask startup founders about the problems they’re solving, the apps they’re building and the lessons they’ve learned during their startup journey. In this edition, we’ll learn why the founders of Genesis , Matt Glickman and Justin Langseth, decided to take on the challenge of creating AI-powered assistants to run generative AI workloads in Snowflake, and why “Eliza” and “Stuart” might soon be joining your team meetings.
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Learn more about processing snapshots using Delta Live Tables and how you can use the new Apply changes from Snapshshot statement in DLT to build SCD Type 1 or SCD Type 2 target tables delivering incremental data and insights that would typically take months of effort on legacy platforms.
It’s nearing the end of the summer in North America, and one report has been a staple on my reading list for more than a decade: the Flexera State of the Cloud Report. The annual survey of hundreds of global IT decision makers assesses cloud strategies, migration trends, and important considerations for companies moving to the cloud or managing cloud environments.
Every business has key customer behaviors it aims to drive — whether it's encouraging repeat purchases, promoting product upgrades or boosting add-on services. Yet, despite access to advanced marketing technology and rich customer profiles, most businesses still rely on broad, generalized lifecycle marketing campaigns that fail to engage with customers.
In this article, we’ll explore how to leverage NumPy to solve systems of nonlinear equations, turning complex mathematical challenges into manageable tasks.
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Within the Databricks Community, there is a technical blog where community members share best practices, tutorials and insights on data analytics, data engineering.
Try Fully Managed Apache Airflow for FREE Run Airflow without the hassle and management complexity. Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. For a limited time, new sign-ups will receive a complimentary Airflow Fundamentals Certification exam (normally $150).
In our latest Confluent Champion post, Janis Hom, staff security GRC program manager, highlights how Confluent fosters a culture that helps her stay motivated.
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content