Sat.Jul 03, 2021 - Fri.Jul 09, 2021

article thumbnail

Airflow on Kubernetes : Get started in 10 mins

Marc Lamberti

Airflow on Kubernetes is quite popular isn’t it? There is a good chance that you know Kubernetes, that you even have a Kubernetes cluster and you would like to deploy and run Airflow on it. However, Kubernetes is hard. There is so many things to deal with that it can be really laborious to just deploy an application. Hopefully for us, some super smart people have created Helm.

article thumbnail

Elastic Distributed Training with XGBoost on Ray

Uber Engineering

Introduction. Since we productionized distributed XGBoost on Apache Spark™ at Uber in 2017, XGBoost has powered a wide spectrum of machine learning (ML) use cases at Uber, spanning from optimizing marketplace dynamic pricing policies for Freight , improving times of … The post Elastic Distributed Training with XGBoost on Ray appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Reflecting on Cloudera’s Commitment to Address Workplace Inequality: One Year Later

Cloudera

It’s been a year of awakening and change across the U.S. and around the world. One year ago our CEO Rob Bearden vowed to take decisive action to make Cloudera a more diverse, equitable, and inclusive place to work and have Cloudera take an active role in promoting those attributes in the tech industry and our communities. . There is no one size fits all solution to creating an intentional and strategic plan for a diverse workforce.

Finance 122
article thumbnail

What to Look Forward to at Kafka Summit APAC

Confluent

Kafka Summit, now in its sixth year, is coming to Asia-Pacific! After launching in the U.S. in 2016 and in Europe in 2018, Kafka Summit APAC will feature speakers and […].

Kafka 105
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Democratize Data Cleaning Across Your Organization With Trifacta

Data Engineering Podcast

Summary Every data project, whether it’s analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and benefits from being accessible to the domain experts. Trifacta is a platform for managing your data engineering workflow to make curating, cleaning, and preparing your information more approachable for everyone in the business.

SQL 100
article thumbnail

Propensity Model: How to Predict Customer Behavior Using Machine Learning

AltexSoft

It’s a common practice for companies and their marketing teams to try guessing how likely certain groups of customers are going to act under certain circumstances. For this purpose, they create propensity models. Built in a traditional statistical fashion, the accuracy of outcomes predictive tools provide isn’t always high. To help companies unlock the full potential of personalized marketing, propensity models should use the power of machine learning technologies.

More Trending

article thumbnail

Tired of First Dates? How to Build a Long-Term Relationship with Data

Teradata

Integrating data from R&D to customer experience and the after-market can deliver stand-out returns for auto companies. But how to go about it? Find out more.

article thumbnail

Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager

Data Engineering Podcast

Summary At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a large fraction of a data team’s energy so it is important to pick something that provides the power and flexibility that you need. SaaSGlue is a managed service that lets you connect all of your systems, across clouds and physical infrastructure, and spanning all of your programming languages.

Systems 100
article thumbnail

Exploiting Implicit Ambiguity in Scala

Rock the JVM

Discover how to use Scala's implicit resolution to enforce type relationships at compile time

Scala 52
article thumbnail

Two Ways to Migrate Hortonworks DataFlow to Cloudera Flow Management

Cloudera

Hortonworks DataFlow (HDF) 3.5.2 was released at the end of 2020. The new releases will not continue under HDF as Cloudera brings the best and latest of Apache NiFi in the new Cloudera Flow Management (CFM) product. Getting the latest improvements and new features of NiFi is one of many reasons for you to move your legacy deployments of NiFi on this new platform.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Open Finance and Smart Ecosystems Won’t Wait for Banks

Teradata

Smart Ecosystems deliver innovation in financial services – converting a product-based industry to a continuum in financial services. Find out more.

Finance 52
article thumbnail

Top 10 Deep Learning Algorithms in Machine Learning [2023]

ProjectPro

When firing Siri or Alexa with questions, people often wonder how machines achieve super-human accuracy. All thanks to deep learning - the incredibly intimidating area of data science. This new domain of deep learning methods is inspired by the functioning of neural networks in the human brain. With the help of natural language processing (NLP) tools, it has led to the development of exciting artificial intelligence applications like language recognition, autonomous vehicles, and computer vision

article thumbnail

The Weekly ETL: How Do You Document Your Data Assets?

Monte Carlo

In Monte Carlo’s Weekly ETL (Explanations Through Lior) series, Lior Gavish, Monte Carlo’s co-founder and CTO, answers a trending question on Reddit about some of the data industry’s hottest topics. Reddit user _Niwubo asks how data teams can go about setting up a solution for documenting their data assets. As someone who has built cataloging initiatives from scratch, I can assure you that it’s never seamless and takes buy-in from your whole organization (which can be hard if y

article thumbnail

Cloudera Operational Database Replication in a Nutshell

Cloudera

In this previous blog post we provided a high-level overview of Cloudera Replication Plugin, explaining how it brings cross-platform replication with little configuration. In this post, we will cover how this plugin can be applied in CDP clusters and explain how the plugin enables strong authentication between systems which do not share mutual authentication trust.

Database 100
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Open Finance and Smart Ecosystems Won’t Wait for Banks

Teradata

Smart Ecosystems deliver innovation in financial services – converting a product-based industry to a continuum in financial services. Find out more.

Finance 52
article thumbnail

5 Can't Miss MongoDB.live Talks

Rockset

MongoDB.live is coming up on July 13-14, and we're going to be there! As with last year, it's going to be a virtual conference, so register (for free), find a comfy spot and surf the numerous sessions available to anyone interested in the MongoDB ecosystem. We spend a lot of time thinking about running analytics on MongoDB, as do many MongoDB users we speak with.

MongoDB 40
article thumbnail

The Ultimate Guide to Data Quality

Monte Carlo

Companies spend upwards of $15 million dollars per year firefighting bad data, with data engineering teams spending 30-50 percent of their time tackling broken pipelines, errant models, and stale dashboards. It’s no secret: data quality isn’t given the diligence it deserves. Fortunately, some of the best data teams are investing in new, smarter approaches to solving it.

Data 40
article thumbnail

15 Deep Learning Projects Ideas for Beginners to Practice 2023

ProjectPro

As a beginner in the data industry, it can be overwhelming to step into AI and deep learning. After taking a deep learning course or two, you might find yourself getting stuck on how to proceed. You don't know what to learn next because you have the theoretical know-how of the concepts and no hands-on experience working with diverse deep learning frameworks and tools.This article will break down the steps you can take to enhance your deep learning skills.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Automating Databricks with Terraform

Scribd Technology

The long term success of our data platform relies on putting tools into the hands of developers and data scientists to “choose their own adventure”. A big part of that story has been Databricks which we recently integrated with Terraform to make it easy to scale a top-notch developer experience. At the 2021 Data and AI Summit, Core Platform infrastructure engineer Hamilton Hord and Databricks engineer Serge Smertin presented on the Databricks terraform provider and how it’s been used by Scribd.

Kafka 40
article thumbnail

How to Handle Database Joins in Apache Druid vs Rockset

Rockset

Apache Druid is a real-time analytics database, providing business intelligence to drive clickstream analytics, analyze risk, monitor network performance, and more. When Druid was introduced in 2011, it did not initially support joins, but a join feature was added in 2020. This is important because it’s often helpful to include fields from multiple Druid files — or multiple tables in a normalized data set — in a single query, providing the equivalent of an SQL join in a relational database.

article thumbnail

RudderStack Product News Vol. #008 - UI Refresh and New Integrations

RudderStack

This month's RudderStack's product updates talk about UI refresh and new integrations - New Product, Advertising, Analytics, Customer Success, and Data Infrastructure Updates

Data 40
article thumbnail

15 Neural Network Projects Ideas for Beginners to Practice 2023

ProjectPro

A curated list of interesting, simple, and cool neural network project ideas for beginners and professionals looking to make a career transition into machine learning or deep learning in 2021. Table of Contents Top 15 Neural Network Projects Ideas for 2023 What is a Neural Network? Applications of Neural Networks Why building Neural Network Projects is the best way to learn deep learning?

Project 40
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Apache Kafka Architecture and Its Components-The A-Z Guide

ProjectPro

A detailed introduction to Apache Kafka Architecture, one of the most popular messaging systems for distributed applications. The first COVID-19 cases were reported in the United States in January 2020. By the end of the year, over 200,000 cases were reported per day, which climbed to 250,000 cases in early 2021. Responding to a pandemic on such a large scale involves technical and public health challenges.

Kafka 40