Sat.Jul 03, 2021 - Fri.Jul 09, 2021

article thumbnail

Airflow on Kubernetes : Get started in 10 mins

Marc Lamberti

Airflow on Kubernetes is quite popular isn’t it? There is a good chance that you know Kubernetes, that you even have a Kubernetes cluster and you would like to deploy and run Airflow on it. However, Kubernetes is hard. There is so many things to deal with that it can be really laborious to just deploy an application. Hopefully for us, some super smart people have created Helm.

article thumbnail

Elastic Distributed Training with XGBoost on Ray

Uber Engineering

Introduction. Since we productionized distributed XGBoost on Apache Spark™ at Uber in 2017, XGBoost has powered a wide spectrum of machine learning (ML) use cases at Uber, spanning from optimizing marketplace dynamic pricing policies for Freight , improving times of … The post Elastic Distributed Training with XGBoost on Ray appeared first on Uber Engineering Blog.

article thumbnail

Reflecting on Cloudera’s Commitment to Address Workplace Inequality: One Year Later

Cloudera

It’s been a year of awakening and change across the U.S. and around the world. One year ago our CEO Rob Bearden vowed to take decisive action to make Cloudera a more diverse, equitable, and inclusive place to work and have Cloudera take an active role in promoting those attributes in the tech industry and our communities. . There is no one size fits all solution to creating an intentional and strategic plan for a diverse workforce.

Finance 126
article thumbnail

What to Look Forward to at Kafka Summit APAC

Confluent

Kafka Summit, now in its sixth year, is coming to Asia-Pacific! After launching in the U.S. in 2016 and in Europe in 2018, Kafka Summit APAC will feature speakers and […].

Kafka 105
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager

Data Engineering Podcast

Summary At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a large fraction of a data team’s energy so it is important to pick something that provides the power and flexibility that you need. SaaSGlue is a managed service that lets you connect all of your systems, across clouds and physical infrastructure, and spanning all of your programming languages.

Systems 100
article thumbnail

Tired of First Dates? How to Build a Long-Term Relationship with Data

Teradata

Integrating data from R&D to customer experience and the after-market can deliver stand-out returns for auto companies. But how to go about it? Find out more.

More Trending

article thumbnail

Exploiting Implicit Ambiguity in Scala

Rock the JVM

Discover how to use Scala's implicit resolution to enforce type relationships at compile time

Scala 52
article thumbnail

Top 10 Deep Learning Algorithms in Machine Learning [2023]

ProjectPro

When firing Siri or Alexa with questions, people often wonder how machines achieve super-human accuracy. All thanks to deep learning - the incredibly intimidating area of data science. This new domain of deep learning methods is inspired by the functioning of neural networks in the human brain. With the help of natural language processing (NLP) tools, it has led to the development of exciting artificial intelligence applications like language recognition, autonomous vehicles, and computer vision

article thumbnail

Open Finance and Smart Ecosystems Won’t Wait for Banks

Teradata

Smart Ecosystems deliver innovation in financial services – converting a product-based industry to a continuum in financial services. Find out more.

Finance 52
article thumbnail

Two Ways to Migrate Hortonworks DataFlow to Cloudera Flow Management

Cloudera

Hortonworks DataFlow (HDF) 3.5.2 was released at the end of 2020. The new releases will not continue under HDF as Cloudera brings the best and latest of Apache NiFi in the new Cloudera Flow Management (CFM) product. Getting the latest improvements and new features of NiFi is one of many reasons for you to move your legacy deployments of NiFi on this new platform.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

The Weekly ETL: How Do You Document Your Data Assets?

Monte Carlo

In Monte Carlo’s Weekly ETL (Explanations Through Lior) series, Lior Gavish, Monte Carlo’s co-founder and CTO, answers a trending question on Reddit about some of the data industry’s hottest topics. Reddit user _Niwubo asks how data teams can go about setting up a solution for documenting their data assets. As someone who has built cataloging initiatives from scratch, I can assure you that it’s never seamless and takes buy-in from your whole organization (which can be hard if y

article thumbnail

Apache Kafka Architecture and Its Components-The A-Z Guide

ProjectPro

A detailed introduction to Apache Kafka Architecture, one of the most popular messaging systems for distributed applications. The first COVID-19 cases were reported in the United States in January 2020. By the end of the year, over 200,000 cases were reported per day, which climbed to 250,000 cases in early 2021. Responding to a pandemic on such a large scale involves technical and public health challenges.

Kafka 40
article thumbnail

Open Finance and Smart Ecosystems Won’t Wait for Banks

Teradata

Smart Ecosystems deliver innovation in financial services – converting a product-based industry to a continuum in financial services. Find out more.

Finance 52
article thumbnail

Cloudera Operational Database Replication in a Nutshell

Cloudera

In this previous blog post we provided a high-level overview of Cloudera Replication Plugin, explaining how it brings cross-platform replication with little configuration. In this post, we will cover how this plugin can be applied in CDP clusters and explain how the plugin enables strong authentication between systems which do not share mutual authentication trust.

Database 106
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

The Ultimate Guide to Data Quality

Monte Carlo

Companies spend upwards of $15 million dollars per year firefighting bad data, with data engineering teams spending 30-50 percent of their time tackling broken pipelines, errant models, and stale dashboards. It’s no secret: data quality isn’t given the diligence it deserves. Fortunately, some of the best data teams are investing in new, smarter approaches to solving it.

Data 40
article thumbnail

5 Can't Miss MongoDB.live Talks

Rockset

MongoDB.live is coming up on July 13-14, and we're going to be there! As with last year, it's going to be a virtual conference, so register (for free), find a comfy spot and surf the numerous sessions available to anyone interested in the MongoDB ecosystem. We spend a lot of time thinking about running analytics on MongoDB, as do many MongoDB users we speak with.

MongoDB 40
article thumbnail

Automating Databricks with Terraform

Scribd Technology

The long term success of our data platform relies on putting tools into the hands of developers and data scientists to “choose their own adventure”. A big part of that story has been Databricks which we recently integrated with Terraform to make it easy to scale a top-notch developer experience. At the 2021 Data and AI Summit, Core Platform infrastructure engineer Hamilton Hord and Databricks engineer Serge Smertin presented on the Databricks terraform provider and how it’s been used by Scribd.

Kafka 40
article thumbnail

RudderStack Product News Vol. #008 - UI Refresh and New Integrations

RudderStack

This month's RudderStack's product updates talk about UI refresh and new integrations - New Product, Advertising, Analytics, Customer Success, and Data Infrastructure Updates

Data 40
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

15 Neural Network Projects Ideas for Beginners to Practice 2023

ProjectPro

A curated list of interesting, simple, and cool neural network project ideas for beginners and professionals looking to make a career transition into machine learning or deep learning in 2021. Table of Contents Top 15 Neural Network Projects Ideas for 2023 What is a Neural Network? Applications of Neural Networks Why building Neural Network Projects is the best way to learn deep learning?

Project 40
article thumbnail

How to Handle Database Joins in Apache Druid vs Rockset

Rockset

Apache Druid is a real-time analytics database, providing business intelligence to drive clickstream analytics, analyze risk, monitor network performance, and more. When Druid was introduced in 2011, it did not initially support joins, but a join feature was added in 2020. This is important because it’s often helpful to include fields from multiple Druid files — or multiple tables in a normalized data set — in a single query, providing the equivalent of an SQL join in a relational database.

article thumbnail

Propensity Model: How to Predict Customer Behavior Using Machine Learning

AltexSoft

It’s a common practice for companies and their marketing teams to try guessing how likely certain groups of customers are going to act under certain circumstances. For this purpose, they create propensity models. Built in a traditional statistical fashion, the accuracy of outcomes predictive tools provide isn’t always high. To help companies unlock the full potential of personalized marketing, propensity models should use the power of machine learning technologies.

article thumbnail

Democratize Data Cleaning Across Your Organization With Trifacta

Data Engineering Podcast

Summary Every data project, whether it’s analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and benefits from being accessible to the domain experts. Trifacta is a platform for managing your data engineering workflow to make curating, cleaning, and preparing your information more approachable for everyone in the business.

SQL 100
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

15 Deep Learning Projects Ideas for Beginners to Practice 2023

ProjectPro

As a beginner in the data industry, it can be overwhelming to step into AI and deep learning. After taking a deep learning course or two, you might find yourself getting stuck on how to proceed. You don't know what to learn next because you have the theoretical know-how of the concepts and no hands-on experience working with diverse deep learning frameworks and tools.This article will break down the steps you can take to enhance your deep learning skills.