Sat.Jul 31, 2021 - Fri.Aug 06, 2021

article thumbnail

How Uber Achieves Operational Excellence in the Data Quality Experience

Uber Engineering

Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands of datasets. While growing rapidly, we’re also committed to maintaining data quality, as it can greatly … The post How Uber Achieves Operational Excellence in the Data Quality Experience appeared first on Uber Engineering Blog.

article thumbnail

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis. Vinoth Chandar helped to create the Hudi project while at Uber to address this challenge.

Data Lake 130
article thumbnail

Choosing Your Upgrade or Migration Path to Cloudera Data Platform

Cloudera

In our previous blog, we talked about the four paths to Cloudera Data Platform. . In-place Upgrade. Sidecar Migration. Rolling Sidecar Migration. Migrating to Cloud. If you haven’t read that yet, we invite you to take a moment and run through the scenarios in that blog. The four strategies will be relevant throughout the rest of this discussion. Today, we’ll discuss an example of how you might make this decision for a cluster using a “round of elimination” process based on our decision workflow.

Finance 120
article thumbnail

Designing and Architecting the Confluent CLI

Confluent

It is often difficult enough to build one application that talks to a single middleware or backend layer; e.g., a whole team of frontend engineers may build a web application […].

Designing 119
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

What is a Data Mesh?

DataKitchen

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. With an architecture comprised of numerous domains, enterprises need to manage order-of-operations issues, inter-domain communication, and shared services like environment creation and meta-orchestration. A DataOps superstructure provides the foundation to address the many challenges inherent in operating a group of interdependent domains.

article thumbnail

Data Discovery From Dashboards To Databases With Castor

Data Engineering Podcast

Summary Every organization needs to be able to use data to answer questions about their business. The trouble is that the data is usually spread across a wide and shifting array of systems, from databases to dashboards. The other challenge is that even if you do find the information you are seeking, there might not be enough context available to determine how to use it or what it means.

Database 100

More Trending

article thumbnail

The New One-Stop Shop for Learning Apache Kafka

Confluent

Today, I’m very excited to announce an all-new website dedicated to Apache Kafka®, event streaming, and associated cloud technologies. The site is called Confluent Developer, and it represents a significant […].

Kafka 85
article thumbnail

Data Marts: What They Are and Why Businesses Need Them

AltexSoft

Imagine you run a candy store. Some sweets are presented on your display cases for quick access while the rest is kept in the storeroom. Now let’s think of sweets as the data required for your company’s daily operations. Instead of combing through the vast amounts of all organizational data stored in a data warehouse, you can use a data mart — a repository that makes specific pieces of data available quickly to any given business unit.

article thumbnail

Building a Modern Data Architecture for the 2020s

DataKitchen

The post Building a Modern Data Architecture for the 2020s first appeared on DataKitchen.

article thumbnail

Minimizing Supply Chain Disruptions with Advanced Analytics

Cloudera

Minimizing Supply Chain Disruptions . January 2020 is a distant memory, but for most, the early days of the pandemic was a time that will be ingrained in memories for decades, if not generations. Over the last 18 months, supply chain issues have dominated our nightly news, social feeds and family conversations at the dinner table. Some but not all have stemmed from the pandemic. .

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

The Weekly ETL: How Do You “Thin Slice” a Data Pipeline?

Monte Carlo

In Monte Carlo’s Weekly ETL (Explanations Through Lior) series, Lior Gavish, Monte Carlo’s co-founder, and CTO answers a trending question on Reddit about some of data engineering’s hottest topics. Reddit thread can be found here Reddit user /treacherous_tim asks how do you “thin slice” a data pipeline and if anyone has faced this challenge before? First, I think it’s great that data engineers are now following best practices from DevOps and software engineering, in this case, starting wit

article thumbnail

7 Best Practices to Use While Annotating Images

AltexSoft

This is a guest article by tech writer Melanie Johnson. No matter how big or small your machine learning (ML) project might be, the overall output depends on the quality of data used to train the ML models. Data annotation plays a pivotal role in the process. And as we know it, it’s the process of marking machine-recognizable content using computer vision, or through natural language processing (NLP) in different formats, including texts, images, and videos.

article thumbnail

20 Artificial Intelligence Project Ideas for Beginners to Practice

ProjectPro

Artificial Intelligence has made a significant impact on our daily lives. Every time you scroll through social media, open Spotify, or do a quick Google search, you are using an application of AI. The AI industry has expanded massively in the past few years and is predicted to grow even further, reaching around 126 billion U.S. dollars by 2025. Multinational companies like IBM, Accenture, and Apple are actively hiring AI practitioners.

Project 52
article thumbnail

The Foundations of a Modern Data-Driven Organisation: Gaining a Clear View of the Customer

Cloudera

Today’s organizations face rising customer expectations in a fragmented marketplace amidst stiff competition. This landscape is one that presents opportunities for a modern data-driven organization to thrive. At the nucleus of such an organization is the practice of accelerating time to insights, using data to make better business decisions at all levels and roles.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Real-time: a fresh approach to data lineage

Datakin

Blog A real-time approach to data lineage Written by Ross Turk on August 5, 2021 A data ecosystem that spans multiple pipelines, teams, and platforms can be overwhelming. Each dataset and job exists in a unique operational context, with interdependencies that may seem simple…until they multiply. Every tiny piece has something in common, though: when it breaks, it becomes the most important thing to everyone you know.

article thumbnail

How Airbnb Built “Wall” to prevent data bugs

Airbnb Tech

Gaining trust in data with extensive data quality, accuracy and anomaly checks As shared in our Data Quality Initiative post , Airbnb has embarked on a project of massive scale to ensure trustworthy data across the company. To enable employees to make faster decisions with data and provide better support for business metric monitoring, we introduced Midas , an analytical data certification process that certifies all important metrics and data sets.

article thumbnail

Building Data Factories to Create Thousands of Data Products

Teradata

The pressure to integrate analytics & machine learning into the automotive business is unrelenting. Find out what the auto industry needs to deliver on its digital promise.

article thumbnail

Accelerating Insight and Uptime: Predictive Maintenance

Cloudera

Historically, maintenance has been driven by a preventative schedule. Today, preventative maintenance, where actions are performed regardless of actual condition, is giving way to Predictive, or Condition-Based, maintenance, where actions are based on actual, real-time insights into operating conditions. While both are far superior to traditional Corrective maintenance (action only after a piece of equipment fails), Predictive is by far the most effective.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Data Engineering Annotated Monthly – July 2021

Big Data Tools

August is a good time to start new things – some people are on vacation and have more spare time to read than usual, while others are back and looking for a quick refresher on what’s new in data engineering. We’re launching this Annotated series to find interesting and useful content on different topics around data engineering, such as news, technical articles, tools, future conferences, and more.

article thumbnail

Android Architecture for the Rocketship - Part 1 : Modularisation

Afterpay Tech

Photo by Chris Kursikowski on Unsplash By: Huan Nguyen Eight months ago at Afterpay, we kicked off our “app rewrite” project in which we are rewriting our React Native apps in native Android and iOS. As part of this project, we are not only aiming at building an app, we are also aiming to build a strong and scalable mobile platform which supports our fast-growing business.

article thumbnail

ETL vs ELT Explained

Grouparoo

The mission of many data teams is a very simple one. They seek to use data to help the business take smarter actions. The input is raw data from everywhere that touches the business. This includes many external sources, its own products, and various systems used for marketing, sales, and operations. The outputs often take the form of analysis, insights, models, and other usable mediums.

article thumbnail

Pillars of Knowledge, Best Practices for Data Governance

Cloudera

Author Chris J. Preimesberger is Editor Emeritus of eWEEK. With hackers now working overtime to expose business data or implant ransomware processes, data security is largely IT managers’ top priority. And if data security tops IT concerns, data governance should be their second priority. Not only is it critical to protect data, but data governance is also the foundation for data-driven businesses and maximizing value from data analytics.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Data Engineering Annotated Monthly – July 2021

Big Data Tools

August is a good time to start new things – some people are on vacation and have more spare time to read than usual, while others are back and looking for a quick refresher on what’s new in data engineering. We’re launching this Annotated series to find interesting and useful content on different topics around data engineering, such as news, technical articles, tools, future conferences, and more.

article thumbnail

Challenging Old Assumptions

Teradata

Cost income ratios in traditional banks remain untenably high. What’s required is a thorough analysis of the overall operating model to improve both sides of the cost income equation.

Banking 52
article thumbnail

Why Rockset & Why Now

Rockset

“The world’s most valuable resource is no longer oil, but data.” There are those rare opportunities in your career where you are at the intersection of multiple macro and micro trends. I’m thrilled to join the team at Rockset that is defining the category of Real-Time Analytics. My Path to Rockset I've been fortunate to have experienced and embraced hyper-growth companies throughout my career.

article thumbnail

Getting Started: Automatic Detection and Alerting for Data Incidents with Monte Carlo

Monte Carlo

In this series, we highlight the critical steps your business must follow when building a data incident management workflow , including incident detection, response, root cause analysis & resolution (RCA), and a blameless post-mortem. Let’s start with incident detection and alerting, your first line of defense against data downtime and broken data pipelines.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

The Data Janitor Letters - June 2021

Pipeline Data Engineering

Data engineering salon. News and interesting reads about the world of data. The Analytics Engineering Guide dbt Labs Collaborating as a data team to produce excellent datasets -- some parts are b t, but it's an interesting read. Welcome to Snowpark: New Data Programmability for the Data Cloud Isaac Kunen, Senior Product Manager, Snowflake Two words: Java functions.

Kafka 40
article thumbnail

How a Supply Chain Digital Hub Can Drive Post-Pandemic Supply Chain Resiliency

Teradata

A Supply Chain Data Hub provides a model-driven set of data objects with maximum data reuse, minimum technical debt, lower cost to build and faster time to market. Find out more.

article thumbnail

Real-Time Data Ingestion: Snowflake, Snowpipe and Rockset

Rockset

Organizations that depend on data for their success and survival need robust, scalable data architecture, typically employing a data warehouse for analytics needs. Snowflake is often their cloud-native data warehouse of choice. With Snowflake, organizations get the simplicity of data management with the power of scaled-out data and distributed processing.

article thumbnail

20+ Image Processing Projects Ideas in Python with Source Code

ProjectPro

Perhaps the great French military leader Napolean Bonaparte wasn't too far off when he said, “A picture is worth a thousand words.” Ignoring the poetic value, if just for a moment, the facts have since been established to prove this statement's literal meaning. Humans, the truly visual beings we are, respond to and process visual data better than any other data type.

Coding 40
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.