This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
The AIoT has the potential to transform industries and society, and it is already starting to have an impact. This article will explore the principles of AIoT, its benefits, and its current use.
Explore GitHub Actions for your Kafka CI/CD pipeline, automate Schema Registry, and transform the development and testing of Kafka client applications.
Summary There are extensive and valuable data sets that are available outside the bounds of your organization. Whether that data is public, paid, or scraped it requires investment and upkeep to acquire and integrate it with your systems. Crux was built to reduce the total cost of acquisition and ownership for integrating external data, offering a fully managed service for delivering those data assets in the manner that best suits your infrastructure.
Web Activity is the easiest way to call any REST API endpoints within a Data Factory Pipeline. In today’s post, we will discuss the basic settings of Web activity. To create a new web activity , search for ‘web’ in the activities pane. Alternatively, it can be located under the General group in the activities pane. As seen in the screenshot below, the main settings for the web activity are as follows: Azure Data Factory: Web Activity URL: This is the REST API endpoint address that we would like
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
My guest this week is Drew Smith , Vice President of Global Data and Analytics at Little Caesars Enterprises and Ilitch Companies. Little Caesars is a pizza franchise that is mainly in the United States. Illitch Companies owns the Detroit Tigers (baseball), Detroit Red Wings (hockey), and several stadiums. Before that, Drew worked at International Institute for Analytics (IIA), an analytics consulting company, and IKEA, the furniture retailer and manufacturer.
This month, Cloudera Cares is excited to spotlight Burt Wagner, senior solutions engineer from Alexandria, Virginia. Burt — who joined Cloudera earlier this year — volunteers regularly with the Boy Scouts of America. He started Scouting as an eight year old; it has always been an integral part of his life and something he now enjoys sharing with his son.
Deep learning forms the backbone of modern day artificial intelligence. Learn more about the important aspects of this connection with this freely available course.
Deep learning forms the backbone of modern day artificial intelligence. Learn more about the important aspects of this connection with this freely available course.
Summary Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model, or data warehouse table that you are working with. Because of its centrality to your data systems it is valuable for debugging, governance, understanding context, and myriad other purposes. This means that it is important to have an accurate and complete lineage graph so that you don’t have to perform your own detective work when time is in s
In a world that creates 1.145 trillion MB of data per day , change is the only constant. With brand new information being seeded every other second, businesses are evolving at the speed of light. Where there’s data, there’s analytics, and thus, the demand for skilled Business Analysts. Data enthusiasts have stumbled across enough facts and figures to know what’s trending and what’s needed to master these trends, which is why over 1,000 learners hit the road to becoming highly sought-after Busine
by Aryan Mehra with Farnaz Karimdady Sharifabad , Prasanna Vijayanathan , Chaïna Wade , Vishal Sharma and Mike Schassberger Aim and Purpose?—?Problem Statement The purpose of this article is to give insights into analyzing and predicting “out of memory” or OOM kills on the Netflix App. Unlike strong compute devices, TVs and set top boxes usually have stronger memory constraints.
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Cloudera has appointed Remus Lim as vice president of Asia Pacific and Japan, to drive adoption of the hybrid data platform across the region and support customers in their journey to become more data-driven. We’ve asked him to share his vision for Cloudera in APAC and reflect on his past few months since taking up the mantle. What drew you to the tech space and attracted you to the roles you’ve held?
In the current age of AI, all digital transformations must be analytics-led. Learn the 7 steps needed to realize the promise of an analytics-led digital transformation.
Summary Exploratory data analysis works best when the feedback loop is fast and iterative. This is easy to achieve when you are working on small datasets, but as they scale up beyond what can fit on a single machine those short iterations quickly become long and tedious. The Arkouda project is a Python interface built on top of the Chapel compiler to bring back those interactive speeds for exploratory analysis on horizontally scalable compute that parallelizes operations on large volumes of data
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Every layer of business operations today uses the power of metrics and analytics to enhance their market growth and business success. With the fourth industrial revolution increasing the dependency on emerging technologies like Data Science, Cloud Computing, IoT, Business Analytics, etc., the need to master the nuances of the same is relatively high.
Complete guide to data pipelines, data integration, and modern data flow, the key to next generation, data-driven applications, systems, and organizations.
The need for data fabric. As Cloudera CMO David Moxey outlined in his blog , we live in a hybrid data world. Data is growing and continues to accelerate its growth. It is changing in makeup and appearing in ever more places. Driving insight and value from it all is as much of an opportunity as it is a challenge. As a result, it’s getting ??progressively more complex for businesses to access, use, and create value from it.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
For many, banking is now a digital activity. But the financial services industry still trails many others in leveraging cloud technologies to build deeper, emotional attachments to their customers.
Summary The current stage of evolution in the data management ecosystem has resulted in domain and use case specific orchestration capabilities being incorporated into various tools. This complicates the work involved in making end-to-end workflows visible and integrated. Dagster has invested in bringing insights about external tools’ dependency graphs into one place through its "software defined assets" functionality.
There are many reasons you, as an analytics engineer, may want to capture the complete version history of data: You’re in an industry with a very high standard for data governance You need to track big OKRs over time to report back to your stakeholders You want to build a window to view history with both forward and backward compatibility These are often high-stakes situations!
Newest features in Confluent’s fully managed, cloud-native data streaming platform: Confluent Terraform provider, Independent Network Lifecycle Management, and more.
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
In this article we discuss the various methods to replicate HBase data and explore why Replication Manager is the best choice for the job with the help of a use case. Cloudera Replication Manager is a key Cloudera Data Platform (CDP) service, designed to copy and migrate data between environments and infrastructures across hybrid clouds. The service provides simple, easy-to-use, and feature-rich data movement capability to deliver data and metadata where it is needed, and has secure data backup
Enterprise data warehouses (EDWs) became necessary in the 1980s when organizations shifted from using data for operational decisions to using data to fuel critical business decisions. Data warehouses differ from operational databases in that while operational transactional databases collate data for multiple transactional purposes, data warehouses aggregate this transactional data for analytics.
Summary Data engineering is a difficult job, requiring a large number of skills that often don’t overlap. Any effort to understand how to start a career in the role has required stitching together information from a multitude of resources that might not all agree with each other. In order to provide a single reference for anyone tasked with data engineering responsibilities Joe Reis and Matt Housley took it upon themselves to write the book "Fundamentals of Data Engineering" In thi
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
What is Data Lineage? Niv Sluzki 2022-07-28 10:20:02 The term “data lineage” has been thrown around a lot over the last few years. What started as an idea of connecting between datasets quickly became a very confusing term that now gets misused often. It’s time to put order to the chaos and dig deep into what it really is. Because the answer matters quite a lot.
How Confluent built Intelligent Storage, for 10x more scalable and elastic Kafka storage with infinite retention, max cluster uptime, and zero operational burdens.
In part 1 of this blog we discussed how Cloudera DataFlow for the Public Cloud (CDF-PC), the universal data distribution service powered by Apache NiFi, can make it easy to acquire data from wherever it originates and move it efficiently to make it available to other applications in a streaming fashion. In this blog we will conclude the implementation of our fraud detection use case and understand how Cloudera Stream Processing makes it simple to create real-time stream processing pipelines that
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content