Sat.Jul 03, 2021 - Fri.Jul 09, 2021

article thumbnail

Airflow on Kubernetes : Get started in 10 mins

Marc Lamberti

Airflow on Kubernetes is quite popular isn’t it? There is a good chance that you know Kubernetes, that you even have a Kubernetes cluster and you would like to deploy and run Airflow on it. However, Kubernetes is hard. There is so many things to deal with that it can be really laborious to just deploy an application. Hopefully for us, some super smart people have created Helm.

article thumbnail

Elastic Distributed Training with XGBoost on Ray

Uber Engineering

Introduction. Since we productionized distributed XGBoost on Apache Spark™ at Uber in 2017, XGBoost has powered a wide spectrum of machine learning (ML) use cases at Uber, spanning from optimizing marketplace dynamic pricing policies for Freight , improving times of … The post Elastic Distributed Training with XGBoost on Ray appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager

Data Engineering Podcast

Summary At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a large fraction of a data team’s energy so it is important to pick something that provides the power and flexibility that you need. SaaSGlue is a managed service that lets you connect all of your systems, across clouds and physical infrastructure, and spanning all of your programming languages.

Systems 100
article thumbnail

Reflecting on Cloudera’s Commitment to Address Workplace Inequality: One Year Later

Cloudera

It’s been a year of awakening and change across the U.S. and around the world. One year ago our CEO Rob Bearden vowed to take decisive action to make Cloudera a more diverse, equitable, and inclusive place to work and have Cloudera take an active role in promoting those attributes in the tech industry and our communities. . There is no one size fits all solution to creating an intentional and strategic plan for a diverse workforce.

Finance 122
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

What to Look Forward to at Kafka Summit APAC

Confluent

Kafka Summit, now in its sixth year, is coming to Asia-Pacific! After launching in the U.S. in 2016 and in Europe in 2018, Kafka Summit APAC will feature speakers and […].

Kafka 105
article thumbnail

Tired of First Dates? How to Build a Long-Term Relationship with Data

Teradata

Integrating data from R&D to customer experience and the after-market can deliver stand-out returns for auto companies. But how to go about it? Find out more.

More Trending

article thumbnail

4 Considerations When Building Your Government Data Strategy

Cloudera

If you’ve followed Cloudera for a while, you know we’ve long been singing the praises—or harping on the importance, depending on perspective—of a solid, standalone enterprise data strategy. While certainly not a new concept, Government missions are wholly dependent on real time access/analysis of data (wherever it may be (legacy data centers or public cloud) to render insight to support operational decisions.

article thumbnail

The Weekly ETL: How Do You Document Your Data Assets?

Monte Carlo

In Monte Carlo’s Weekly ETL (Explanations Through Lior) series, Lior Gavish, Monte Carlo’s co-founder and CTO, answers a trending question on Reddit about some of the data industry’s hottest topics. Reddit user _Niwubo asks how data teams can go about setting up a solution for documenting their data assets. As someone who has built cataloging initiatives from scratch, I can assure you that it’s never seamless and takes buy-in from your whole organization (which can be hard if y

article thumbnail

Open Finance and Smart Ecosystems Won’t Wait for Banks

Teradata

Smart Ecosystems deliver innovation in financial services – converting a product-based industry to a continuum in financial services. Find out more.

Finance 52
article thumbnail

Apache Kafka Architecture and Its Components-The A-Z Guide

ProjectPro

A detailed introduction to Apache Kafka Architecture, one of the most popular messaging systems for distributed applications. The first COVID-19 cases were reported in the United States in January 2020. By the end of the year, over 200,000 cases were reported per day, which climbed to 250,000 cases in early 2021. Responding to a pandemic on such a large scale involves technical and public health challenges.

Kafka 40
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Two Ways to Migrate Hortonworks DataFlow to Cloudera Flow Management

Cloudera

Hortonworks DataFlow (HDF) 3.5.2 was released at the end of 2020. The new releases will not continue under HDF as Cloudera brings the best and latest of Apache NiFi in the new Cloudera Flow Management (CFM) product. Getting the latest improvements and new features of NiFi is one of many reasons for you to move your legacy deployments of NiFi on this new platform.

article thumbnail

5 Can't Miss MongoDB.live Talks

Rockset

MongoDB.live is coming up on July 13-14, and we're going to be there! As with last year, it's going to be a virtual conference, so register (for free), find a comfy spot and surf the numerous sessions available to anyone interested in the MongoDB ecosystem. We spend a lot of time thinking about running analytics on MongoDB, as do many MongoDB users we speak with.

MongoDB 40
article thumbnail

Open Finance and Smart Ecosystems Won’t Wait for Banks

Teradata

Smart Ecosystems deliver innovation in financial services – converting a product-based industry to a continuum in financial services. Find out more.

Finance 52
article thumbnail

The Ultimate Guide to Data Quality

Monte Carlo

Companies spend upwards of $15 million dollars per year firefighting bad data, with data engineering teams spending 30-50 percent of their time tackling broken pipelines, errant models, and stale dashboards. It’s no secret: data quality isn’t given the diligence it deserves. Fortunately, some of the best data teams are investing in new, smarter approaches to solving it.

Data 40
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Cloudera Operational Database Replication in a Nutshell

Cloudera

In this previous blog post we provided a high-level overview of Cloudera Replication Plugin, explaining how it brings cross-platform replication with little configuration. In this post, we will cover how this plugin can be applied in CDP clusters and explain how the plugin enables strong authentication between systems which do not share mutual authentication trust.

article thumbnail

Automating Databricks with Terraform

Scribd Technology

The long term success of our data platform relies on putting tools into the hands of developers and data scientists to “choose their own adventure”. A big part of that story has been Databricks which we recently integrated with Terraform to make it easy to scale a top-notch developer experience. At the 2021 Data and AI Summit, Core Platform infrastructure engineer Hamilton Hord and Databricks engineer Serge Smertin presented on the Databricks terraform provider and how it’s been used by Scribd.

Kafka 40
article thumbnail

How to Handle Database Joins in Apache Druid vs Rockset

Rockset

Apache Druid is a real-time analytics database, providing business intelligence to drive clickstream analytics, analyze risk, monitor network performance, and more. When Druid was introduced in 2011, it did not initially support joins, but a join feature was added in 2020. This is important because it’s often helpful to include fields from multiple Druid files — or multiple tables in a normalized data set — in a single query, providing the equivalent of an SQL join in a relational database.

article thumbnail

RudderStack Product News Vol. #008 - UI Refresh and New Integrations

RudderStack

This month's RudderStack's product updates talk about UI refresh and new integrations - New Product, Advertising, Analytics, Customer Success, and Data Infrastructure Updates

Data 40
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

15 Neural Network Projects Ideas for Beginners to Practice 2023

ProjectPro

A curated list of interesting, simple, and cool neural network project ideas for beginners and professionals looking to make a career transition into machine learning or deep learning in 2021. Table of Contents Top 15 Neural Network Projects Ideas for 2023 What is a Neural Network? Applications of Neural Networks Why building Neural Network Projects is the best way to learn deep learning?

Project 40
article thumbnail

Exploiting Implicit Ambiguity in Scala

Rock the JVM

Discover how to use Scala's implicit resolution to enforce type relationships at compile time

Scala 52
article thumbnail

Propensity Model: How to Predict Customer Behavior Using Machine Learning

AltexSoft

It’s a common practice for companies and their marketing teams to try guessing how likely certain groups of customers are going to act under certain circumstances. For this purpose, they create propensity models. Built in a traditional statistical fashion, the accuracy of outcomes predictive tools provide isn’t always high. To help companies unlock the full potential of personalized marketing, propensity models should use the power of machine learning technologies.

article thumbnail

Democratize Data Cleaning Across Your Organization With Trifacta

Data Engineering Podcast

Summary Every data project, whether it’s analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and benefits from being accessible to the domain experts. Trifacta is a platform for managing your data engineering workflow to make curating, cleaning, and preparing your information more approachable for everyone in the business.

SQL 100
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

15 Deep Learning Projects Ideas for Beginners to Practice 2023

ProjectPro

As a beginner in the data industry, it can be overwhelming to step into AI and deep learning. After taking a deep learning course or two, you might find yourself getting stuck on how to proceed. You don't know what to learn next because you have the theoretical know-how of the concepts and no hands-on experience working with diverse deep learning frameworks and tools.This article will break down the steps you can take to enhance your deep learning skills.