Sat.Jul 31, 2021 - Fri.Aug 06, 2021

article thumbnail

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis. Vinoth Chandar helped to create the Hudi project while at Uber to address this challenge.

Data Lake 130
article thumbnail

How Uber Achieves Operational Excellence in the Data Quality Experience

Uber Engineering

Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands of datasets. While growing rapidly, we’re also committed to maintaining data quality, as it can greatly … The post How Uber Achieves Operational Excellence in the Data Quality Experience appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Designing and Architecting the Confluent CLI

Confluent

It is often difficult enough to build one application that talks to a single middleware or backend layer; e.g., a whole team of frontend engineers may build a web application […].

Designing 119
article thumbnail

Choosing Your Upgrade or Migration Path to Cloudera Data Platform

Cloudera

In our previous blog, we talked about the four paths to Cloudera Data Platform. . In-place Upgrade. Sidecar Migration. Rolling Sidecar Migration. Migrating to Cloud. If you haven’t read that yet, we invite you to take a moment and run through the scenarios in that blog. The four strategies will be relevant throughout the rest of this discussion. Today, we’ll discuss an example of how you might make this decision for a cluster using a “round of elimination” process based on our decision workflow.

Finance 119
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Data Discovery From Dashboards To Databases With Castor

Data Engineering Podcast

Summary Every organization needs to be able to use data to answer questions about their business. The trouble is that the data is usually spread across a wide and shifting array of systems, from databases to dashboards. The other challenge is that even if you do find the information you are seeking, there might not be enough context available to determine how to use it or what it means.

Database 100
article thumbnail

What is a Data Mesh?

DataKitchen

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. With an architecture comprised of numerous domains, enterprises need to manage order-of-operations issues, inter-domain communication, and shared services like environment creation and meta-orchestration. A DataOps superstructure provides the foundation to address the many challenges inherent in operating a group of interdependent domains.

More Trending

article thumbnail

Replace and Boost your Apache Storm Topologies with Apache NiFi Flows

Cloudera

Recently, I worked with a large fortune 500 customer on their migration from Apache Storm to Apache NiFi. If you’re asking yourself, “Isn’t Storm for complex event processing and NiFi for simple event processing?”, you’re correct. A few customers chose a complex event engine like Apache Storm for their simple event processing, even when Apache NiFi is the more practical choice, cutting drastically down on SDLC (software development lifecycle) time.

Kafka 119
article thumbnail

Data Marts: What They Are and Why Businesses Need Them

AltexSoft

Imagine you run a candy store. Some sweets are presented on your display cases for quick access while the rest is kept in the storeroom. Now let’s think of sweets as the data required for your company’s daily operations. Instead of combing through the vast amounts of all organizational data stored in a data warehouse, you can use a data mart — a repository that makes specific pieces of data available quickly to any given business unit.

article thumbnail

How a Supply Chain Digital Hub Can Drive Post-Pandemic Supply Chain Resiliency

Teradata

A Supply Chain Data Hub provides a model-driven set of data objects with maximum data reuse, minimum technical debt, lower cost to build and faster time to market. Find out more.

article thumbnail

The Weekly ETL: How Do You “Thin Slice” a Data Pipeline?

Monte Carlo

In Monte Carlo’s Weekly ETL (Explanations Through Lior) series, Lior Gavish, Monte Carlo’s co-founder, and CTO answers a trending question on Reddit about some of data engineering’s hottest topics. Reddit thread can be found here Reddit user /treacherous_tim asks how do you “thin slice” a data pipeline and if anyone has faced this challenge before? First, I think it’s great that data engineers are now following best practices from DevOps and software engineering, in this case, starting wit

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Minimizing Supply Chain Disruptions with Advanced Analytics

Cloudera

Minimizing Supply Chain Disruptions . January 2020 is a distant memory, but for most, the early days of the pandemic was a time that will be ingrained in memories for decades, if not generations. Over the last 18 months, supply chain issues have dominated our nightly news, social feeds and family conversations at the dinner table. Some but not all have stemmed from the pandemic. .

article thumbnail

7 Best Practices to Use While Annotating Images

AltexSoft

This is a guest article by tech writer Melanie Johnson. No matter how big or small your machine learning (ML) project might be, the overall output depends on the quality of data used to train the ML models. Data annotation plays a pivotal role in the process. And as we know it, it’s the process of marking machine-recognizable content using computer vision, or through natural language processing (NLP) in different formats, including texts, images, and videos.

article thumbnail

Building Data Factories to Create Thousands of Data Products

Teradata

The pressure to integrate analytics & machine learning into the automotive business is unrelenting. Find out what the auto industry needs to deliver on its digital promise.

article thumbnail

20 Artificial Intelligence Project Ideas for Beginners to Practice

ProjectPro

Artificial Intelligence has made a significant impact on our daily lives. Every time you scroll through social media, open Spotify, or do a quick Google search, you are using an application of AI. The AI industry has expanded massively in the past few years and is predicted to grow even further, reaching around 126 billion U.S. dollars by 2025. Multinational companies like IBM, Accenture, and Apple are actively hiring AI practitioners.

Project 52
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

The Foundations of a Modern Data-Driven Organisation: Gaining a Clear View of the Customer

Cloudera

Today’s organizations face rising customer expectations in a fragmented marketplace amidst stiff competition. This landscape is one that presents opportunities for a modern data-driven organization to thrive. At the nucleus of such an organization is the practice of accelerating time to insights, using data to make better business decisions at all levels and roles.

article thumbnail

Real-time: a fresh approach to data lineage

Datakin

Blog A real-time approach to data lineage Written by Ross Turk on August 5, 2021 A data ecosystem that spans multiple pipelines, teams, and platforms can be overwhelming. Each dataset and job exists in a unique operational context, with interdependencies that may seem simple…until they multiply. Every tiny piece has something in common, though: when it breaks, it becomes the most important thing to everyone you know.

article thumbnail

Android Architecture for the Rocketship - Part 1 : Modularisation

Afterpay Tech

Photo by Chris Kursikowski on Unsplash By: Huan Nguyen Eight months ago at Afterpay, we kicked off our “app rewrite” project in which we are rewriting our React Native apps in native Android and iOS. As part of this project, we are not only aiming at building an app, we are also aiming to build a strong and scalable mobile platform which supports our fast-growing business.

article thumbnail

ETL vs ELT Explained

Grouparoo

The mission of many data teams is a very simple one. They seek to use data to help the business take smarter actions. The input is raw data from everywhere that touches the business. This includes many external sources, its own products, and various systems used for marketing, sales, and operations. The outputs often take the form of analysis, insights, models, and other usable mediums.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Accelerating Insight and Uptime: Predictive Maintenance

Cloudera

Historically, maintenance has been driven by a preventative schedule. Today, preventative maintenance, where actions are performed regardless of actual condition, is giving way to Predictive, or Condition-Based, maintenance, where actions are based on actual, real-time insights into operating conditions. While both are far superior to traditional Corrective maintenance (action only after a piece of equipment fails), Predictive is by far the most effective.

article thumbnail

How Airbnb Built “Wall” to prevent data bugs

Airbnb Tech

Gaining trust in data with extensive data quality, accuracy and anomaly checks As shared in our Data Quality Initiative post , Airbnb has embarked on a project of massive scale to ensure trustworthy data across the company. To enable employees to make faster decisions with data and provide better support for business metric monitoring, we introduced Midas , an analytical data certification process that certifies all important metrics and data sets.

article thumbnail

Challenging Old Assumptions

Teradata

Cost income ratios in traditional banks remain untenably high. What’s required is a thorough analysis of the overall operating model to improve both sides of the cost income equation.

Banking 52
article thumbnail

Data Engineering Annotated Monthly – July 2021

Big Data Tools

August is a good time to start new things – some people are on vacation and have more spare time to read than usual, while others are back and looking for a quick refresher on what’s new in data engineering. We’re launching this Annotated series to find interesting and useful content on different topics around data engineering, such as news, technical articles, tools, future conferences, and more.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Pillars of Knowledge, Best Practices for Data Governance

Cloudera

Author Chris J. Preimesberger is Editor Emeritus of eWEEK. With hackers now working overtime to expose business data or implant ransomware processes, data security is largely IT managers’ top priority. And if data security tops IT concerns, data governance should be their second priority. Not only is it critical to protect data, but data governance is also the foundation for data-driven businesses and maximizing value from data analytics.

article thumbnail

Building a Modern Data Architecture for the 2020s

DataKitchen

The post Building a Modern Data Architecture for the 2020s first appeared on DataKitchen.

article thumbnail

Why Rockset & Why Now

Rockset

“The world’s most valuable resource is no longer oil, but data.” There are those rare opportunities in your career where you are at the intersection of multiple macro and micro trends. I’m thrilled to join the team at Rockset that is defining the category of Real-Time Analytics. My Path to Rockset I've been fortunate to have experienced and embraced hyper-growth companies throughout my career.

article thumbnail

Data Engineering Annotated Monthly – July 2021

Big Data Tools

August is a good time to start new things – some people are on vacation and have more spare time to read than usual, while others are back and looking for a quick refresher on what’s new in data engineering. We’re launching this Annotated series to find interesting and useful content on different topics around data engineering, such as news, technical articles, tools, future conferences, and more.

article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

Getting Started: Automatic Detection and Alerting for Data Incidents with Monte Carlo

Monte Carlo

In this series, we highlight the critical steps your business must follow when building a data incident management workflow , including incident detection, response, root cause analysis & resolution (RCA), and a blameless post-mortem. Let’s start with incident detection and alerting, your first line of defense against data downtime and broken data pipelines.

article thumbnail

Writing our Golden Path

Eventbrite Engineering

In my last blog post I explained how we defined our 3-year technical vision for the company. One of the key pillars of this vision is shifting from a model where we used the same tool for every job (mostly a combination of Python + Django + MySQL), to the right tool(s) for each job. … Continue reading "Writing our Golden Path" The post Writing our Golden Path appeared first on Engineering Blog.

MySQL 40
article thumbnail

Real-Time Data Ingestion: Snowflake, Snowpipe and Rockset

Rockset

Organizations that depend on data for their success and survival need robust, scalable data architecture, typically employing a data warehouse for analytics needs. Snowflake is often their cloud-native data warehouse of choice. With Snowflake, organizations get the simplicity of data management with the power of scaled-out data and distributed processing.

article thumbnail

The Data Janitor Letters - June 2021

Pipeline Data Engineering

Data engineering salon. News and interesting reads about the world of data. The Analytics Engineering Guide dbt Labs Collaborating as a data team to produce excellent datasets -- some parts are b t, but it's an interesting read. Welcome to Snowpark: New Data Programmability for the Data Cloud Isaac Kunen, Senior Product Manager, Snowflake Two words: Java functions.

Kafka 40
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.