This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Data is the lifeblood of so much of what we build as software professionals, so it’s unsurprising that operations involving its transfer occupy the vast majority of developer time across […].
After the launch of CDP Data Engineering (CDE) on AWS a few months ago, we are thrilled to announce that CDE, the only cloud-native service purpose built for enterprise data engineers, is now available on Microsoft Azure. . CDP Data Engineering offers an all-inclusive toolset that enables data pipeline orchestration, automation, advanced monitoring, visual profiling, and a comprehensive management toolset for streamlining ETL processes and making complex data actionable across your analytic team
High Level Overview of the Problem. Introduction. If you’ve used any online/digital service, chances are that you are familiar with what a typical customer service experience entails: you send a message (usually email aliased) to the company’s support staff, fill … The post Customer Support Automation Platform at Uber appeared first on Uber Engineering Blog.
Summary There is a wealth of tools and systems available for processing data, but the user experience of integrating them and building workflows is still lacking. This is particularly important in large and complex organizations where domain knowledge and context is paramount and there may not be access to engineers for codifying that expertise. Raj Bains founded Prophecy to address this need by creating a UI first platform for building and executing data engineering workflows that orchestrates
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
In Data Science projects, we distinguish between descriptive analytics and statistical models running in production. Overall, these can be seen as one process. You start with analyzing historical data to […].
Did you know Cloudera customers, such as SMG and Geisinger , offloaded their legacy DW environment to Cloudera Data Warehouse (CDW) to take advantage of CDW’s modern architecture and best-in-class performance? In addition to substantial cost savings upon moving to CDW, Geisinger is also able to search through hundreds of million patient note records in seconds providing better treatment to their patients.
Summary We have been building platforms and workflows to store, process, and analyze data since the earliest days of computing. Over that time there have been countless architectures, patterns, and "best practices" to make that task manageable. With the growing popularity of cloud services a new pattern has emerged and been dubbed the "Modern Data Stack" In this episode members of the GoDataDriven team, Guillermo Sanchez, Bram Ochsendorf, and Juan Perafan, explain the combination
Data Engineers of Netflix?—?Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin Wylie is a Data Engineer on the Content Data Science and Engineering team. In this post, Kevin talks about his extensive experience in content analytics at Netflix since joining more than 10 years ago.
The 2021 Data Impact Award (DIA) submissions are starting to stream in, and we know many of you are contemplating your entries – which we are excited to see. To help guide your award strategy, we thought it would be an excellent opportunity to ask our judges — a panel comprised of leading analysts and journalists well-versed in the application of data and the wider benefits it can bring across industries – what it takes for a winning project.
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Rockset was founded to make it easy for developers and data teams to go from real-time data to actionable insights. We designed Rockset to remove many of the barriers teams face while building with real-time data including data preparation, performance tuning and infrastructure management. We also built ground up to support full SQL (including joins and aggregations), the most common query language for analytics.
Learn how complexities baked into the data analytics ecosystems of supply chains can be simplified to eliminate redundancy, increase time to value, and reduce cost.
In the late 90s, when I was pursuing my studies in engineering, only a few girls enrolled in any STEM-related courses. While it was our love for math & science and the prospect of future opportunities that brought us here, we sadly found many of them gave up halfway through the course, and those who graduated either quit or never entered the profession. .
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
User-uploaded documents have been a core component of Scribd’s business from the very beginning, understanding what is actually in the document corpus unlocks exciting new opportunities for discovery and recommendation. With Scribd anybody can upload and share documents , analogous to YouTube and videos. Over the years, our document corpus has become larger and more diverse which has made understanding it an ever-increasing challenge.
Introduction and Rationale. The release of Cloudera Data Platform (CDP) Private Cloud Base edition provides customers with a next generation hybrid cloud architecture. This blog post provides an overview of best practice for the design and deployment of clusters incorporating hardware and operating system configuration, along with guidance for networking and security as well as integration with existing enterprise infrastructure.
In Monte Carlo’s Weekly ETL (Explanations Through Lior) series, Lior Gavish, Monte Carlo’s co-founder and CTO, answers a trending question on Reddit about some of data engineering’s hottest topics. Reddit thread can be found here. Reddit user /SWE-Aaron asks if data engineering will ever get the same attention as data science and whether that would actually be a good thing.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
People searching for cloud computing jobs per million grew by approximately 50%. According to an Indeed Jobs report, the share of cloud computing jobs has increased by 42% per million from 2018 to 2021. The global cloud computing market is poised to grow $287.03 billion during 2021-2025. Also, global spending on public cloud services will double by 2023.
We recently hosted a roundtable focused on o ptimizing risk and exposure management with data insights. For financial institutions and insurers, risk and exposure management has always been a fundamental tenet of the business. Now, risk management has become exponentially complicated in multiple dimensions. . In this session we explored what firms are doing to approach the uncertainty with more predictability.
As data pipelines become increasingly complex, investing in a data quality solution is becoming an increasingly important priority for modern data teams. But should you build it—or buy it? There are 4 key challenges, opportunities, and trade-offs when considering building or buying a data observability or data quality solution. In this post we will cover: The importance of data quality Understanding the expected time-to-value for your data quality solution Factoring in the opportunity cost of bu
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Linear Regression is probably one of the most well-known machine learning algorithms. It essentially involves modeling the relation between the given or derived parameters and the target to be learned. Therefore, any machine Learning job interview would be incomplete without a peppering of Linear Regression questions. These linear regression interview questions and answers will help you prepare for your machine learning interview.
We're excited to announce the release of Apache Superset 1.2! In this release post, we will focus on the biggest and most interesting tangible, end-user features.
Last week we held our third Women In Data Webinar, and what a session it was! We were honored to welcome Justyna Lebedyk, Senior Product Owner Big Data, Commerzbank AG, who posed the question “Does diversity win?” . I had the pleasure of chatting with Justyna about the key themes from her talk and what advice she would give to others looking to pursue a career in data. .
Incident IQ gives data engineers and analysts a centralized, all-in-one solution for conducting incident management and root cause analysis on your data pipelines. Video courtesy of Monte Carlo. Today, we are excited to announce the release of Monte Carlo’s data incident management feature, Incident IQ, a new solution that allows data teams to collaboratively identify, alert on, and remediate the root cause of critical data issues before they impact downstream systems and end users.
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
The demand for data-related roles has increased massively in the past few years. Companies are actively seeking talent in these areas, and there is a huge market for individuals who can manipulate data, work with large databases and build machine learning algorithms. While data science is the most hyped-up career path in the data industry, it certainly isn't the only one.
Can an organization eradicate workplace politics completely? Defined by the Harvard Business Review as “a variety of activities associated with the use of influence tactics to improve personal or organizational interests”, politics at the workplace is inevitable. Undeniably, wielding influence to achieve positive outcomes is encouraged. However the question leaders should be asking is, are fragmented individual agendas taking precedence over an organization’s mission?
Monte Carlo , the data reliability company, today released data incident management feature, Incident IQ, a new suite of capabilities that help data engineers better pinpoint, address, and resolve data downtime at scale through the Monte Carlo Data Observability Platform. Incident IQ automatically generates rich insights about critical data issues through root cause analysis, giving teams unprecedented visibility into the end-to-end health and trust of their data beyond the scope of traditional
Time series analysis and forecasting is a dark horse in the domain of Data Science. Time series is among the most applied Data Science techniques in various industrial and business operations, such as financial analysis , production planning, supply chain management, and many more. Machine learning for time series is often a neglected topic. More recent techniques, such as natural language processing, pattern recognition, and others usually gain better attention.
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content