This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Airflow on Kubernetes is quite popular isn’t it? There is a good chance that you know Kubernetes, that you even have a Kubernetes cluster and you would like to deploy and run Airflow on it. However, Kubernetes is hard. There is so many things to deal with that it can be really laborious to just deploy an application. Hopefully for us, some super smart people have created Helm.
Introduction. Since we productionized distributed XGBoost on Apache Spark™ at Uber in 2017, XGBoost has powered a wide spectrum of machine learning (ML) use cases at Uber, spanning from optimizing marketplace dynamic pricing policies for Freight , improving times of … The post Elastic Distributed Training with XGBoost on Ray appeared first on Uber Engineering Blog.
It’s been a year of awakening and change across the U.S. and around the world. One year ago our CEO Rob Bearden vowed to take decisive action to make Cloudera a more diverse, equitable, and inclusive place to work and have Cloudera take an active role in promoting those attributes in the tech industry and our communities. . There is no one size fits all solution to creating an intentional and strategic plan for a diverse workforce.
Kafka Summit, now in its sixth year, is coming to Asia-Pacific! After launching in the U.S. in 2016 and in Europe in 2018, Kafka Summit APAC will feature speakers and […].
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
Summary Every data project, whether it’s analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and benefits from being accessible to the domain experts. Trifacta is a platform for managing your data engineering workflow to make curating, cleaning, and preparing your information more approachable for everyone in the business.
It’s a common practice for companies and their marketing teams to try guessing how likely certain groups of customers are going to act under certain circumstances. For this purpose, they create propensity models. Built in a traditional statistical fashion, the accuracy of outcomes predictive tools provide isn’t always high. To help companies unlock the full potential of personalized marketing, propensity models should use the power of machine learning technologies.
If you’ve followed Cloudera for a while, you know we’ve long been singing the praises—or harping on the importance, depending on perspective—of a solid, standalone enterprise data strategy. While certainly not a new concept, Government missions are wholly dependent on real time access/analysis of data (wherever it may be (legacy data centers or public cloud) to render insight to support operational decisions.
If you’ve followed Cloudera for a while, you know we’ve long been singing the praises—or harping on the importance, depending on perspective—of a solid, standalone enterprise data strategy. While certainly not a new concept, Government missions are wholly dependent on real time access/analysis of data (wherever it may be (legacy data centers or public cloud) to render insight to support operational decisions.
Integrating data from R&D to customer experience and the after-market can deliver stand-out returns for auto companies. But how to go about it? Find out more.
Summary At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a large fraction of a data team’s energy so it is important to pick something that provides the power and flexibility that you need. SaaSGlue is a managed service that lets you connect all of your systems, across clouds and physical infrastructure, and spanning all of your programming languages.
Hortonworks DataFlow (HDF) 3.5.2 was released at the end of 2020. The new releases will not continue under HDF as Cloudera brings the best and latest of Apache NiFi in the new Cloudera Flow Management (CFM) product. Getting the latest improvements and new features of NiFi is one of many reasons for you to move your legacy deployments of NiFi on this new platform.
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
When firing Siri or Alexa with questions, people often wonder how machines achieve super-human accuracy. All thanks to deep learning - the incredibly intimidating area of data science. This new domain of deep learning methods is inspired by the functioning of neural networks in the human brain. With the help of natural language processing (NLP) tools, it has led to the development of exciting artificial intelligence applications like language recognition, autonomous vehicles, and computer vision
In Monte Carlo’s Weekly ETL (Explanations Through Lior) series, Lior Gavish, Monte Carlo’s co-founder and CTO, answers a trending question on Reddit about some of the data industry’s hottest topics. Reddit user _Niwubo asks how data teams can go about setting up a solution for documenting their data assets. As someone who has built cataloging initiatives from scratch, I can assure you that it’s never seamless and takes buy-in from your whole organization (which can be hard if y
In this previous blog post we provided a high-level overview of Cloudera Replication Plugin, explaining how it brings cross-platform replication with little configuration. In this post, we will cover how this plugin can be applied in CDP clusters and explain how the plugin enables strong authentication between systems which do not share mutual authentication trust.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
MongoDB.live is coming up on July 13-14, and we're going to be there! As with last year, it's going to be a virtual conference, so register (for free), find a comfy spot and surf the numerous sessions available to anyone interested in the MongoDB ecosystem. We spend a lot of time thinking about running analytics on MongoDB, as do many MongoDB users we speak with.
Companies spend upwards of $15 million dollars per year firefighting bad data, with data engineering teams spending 30-50 percent of their time tackling broken pipelines, errant models, and stale dashboards. It’s no secret: data quality isn’t given the diligence it deserves. Fortunately, some of the best data teams are investing in new, smarter approaches to solving it.
As a beginner in the data industry, it can be overwhelming to step into AI and deep learning. After taking a deep learning course or two, you might find yourself getting stuck on how to proceed. You don't know what to learn next because you have the theoretical know-how of the concepts and no hands-on experience working with diverse deep learning frameworks and tools.This article will break down the steps you can take to enhance your deep learning skills.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
The long term success of our data platform relies on putting tools into the hands of developers and data scientists to “choose their own adventure”. A big part of that story has been Databricks which we recently integrated with Terraform to make it easy to scale a top-notch developer experience. At the 2021 Data and AI Summit, Core Platform infrastructure engineer Hamilton Hord and Databricks engineer Serge Smertin presented on the Databricks terraform provider and how it’s been used by Scribd.
Apache Druid is a real-time analytics database, providing business intelligence to drive clickstream analytics, analyze risk, monitor network performance, and more. When Druid was introduced in 2011, it did not initially support joins, but a join feature was added in 2020. This is important because it’s often helpful to include fields from multiple Druid files — or multiple tables in a normalized data set — in a single query, providing the equivalent of an SQL join in a relational database.
This month's RudderStack's product updates talk about UI refresh and new integrations - New Product, Advertising, Analytics, Customer Success, and Data Infrastructure Updates
A curated list of interesting, simple, and cool neural network project ideas for beginners and professionals looking to make a career transition into machine learning or deep learning in 2021. Table of Contents Top 15 Neural Network Projects Ideas for 2023 What is a Neural Network? Applications of Neural Networks Why building Neural Network Projects is the best way to learn deep learning?
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
A detailed introduction to Apache Kafka Architecture, one of the most popular messaging systems for distributed applications. The first COVID-19 cases were reported in the United States in January 2020. By the end of the year, over 200,000 cases were reported per day, which climbed to 250,000 cases in early 2021. Responding to a pandemic on such a large scale involves technical and public health challenges.
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content