August, 2021

article thumbnail

Understand & Deliver on Your Data Engineering Task

Start Data Engineering

1. Introduction 2. Understanding your data engineering task 2.1. Data infrastructure overview 2.2. What exactly 2.3. Why exactly 2.4. Current state 2.5. Downstream impact 3. Delivering your data engineering task 3.1. How 3.2. Breakdown into sub-tasks 3.3. Delivering the finished task 4. Conclusion 5. Further reading 1. Introduction Congratulations! You are given a quick overview of the business and data architecture and are assigned your very first data engineering task.

article thumbnail

Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma

Data Engineering Podcast

Summary All of the fancy data platform tools and shiny dashboards that you use are pointless if the consumers of your analysis don’t have trust in the answers. Stemma helps you establish and maintain that trust by giving visibility into who is using what data, annotating the reports with useful context, and understanding who is responsible for keeping it up to date.

IT 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How Uber Achieves Operational Excellence in the Data Quality Experience

Uber Engineering

Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands of datasets. While growing rapidly, we’re also committed to maintaining data quality, as it can greatly … The post How Uber Achieves Operational Excellence in the Data Quality Experience appeared first on Uber Engineering Blog.

article thumbnail

How ksqlDB Works: Internal Architecture and Advanced Features

Confluent

To effectively use ksqlDB, the streaming database for Apache Kafka®, you should of course be familiar with its features and syntax. However, a deeper understanding of what goes on underneath […].

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Natural Language Processing: A Guide to NLP Use Cases, Approaches, and Tools

AltexSoft

Humans have been trying to make machines chat for decades. Alan Turing considered computers’ ability to generate natural speech a proof of their ability to think. Today, we converse with virtual companions all the time. But despite years of research and innovation, their unnatural responses remind us that no, we’re not yet at the HAL 9000-level of speech sophistication.

Process 139
article thumbnail

A ‘Fresh Squeeze on Data’ to Help Children Learn about Data, AI and Machine Learning

Cloudera

Dear Parents and Educators and Friends of Cloudera, If you are reading this blog, you know us at Cloudera as a group of self-described data geeks and data analysts. We believe data drives better decisions and moves businesses forward and for us, that’s exciting. We are innovating and helping Fortune 500 transform and grow because they can make better data-driven decisions at the accelerated pace we live and work in today.

More Trending

article thumbnail

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis. Vinoth Chandar helped to create the Hudi project while at Uber to address this challenge.

Data Lake 130
article thumbnail

How DataOps is Transforming Commercial Pharma Analytics

DataKitchen

DataOps has become an essential methodology in pharmaceutical enterprise data organizations, especially for commercial operations. Companies that implement it well derive significant competitive advantage from their superior ability to manage and create value from data. They will be able to produce high-quality, on-demand insight that consistently leads to successful business decisions.

article thumbnail

Announcing Elastic Data Streams Support for Confluent’s Elasticsearch Sink Connector

Confluent

Today, as part of our expanded partnership with Elastic, we are announcing an update to the fully managed Elasticsearch Sink Connector in Confluent Cloud. This update allows you to take […].

Cloud 122
article thumbnail

The Power of Path Analysis

Teradata

For both analysts and data scientists, identifying paths and patterns in data is a valuable way to gain insight into the occurrences leading to or from any event of interest. Read more.

Data 98
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Cloudera DataFlow for the Public Cloud: A technical deep dive

Cloudera

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. CDF-PC enables Apache NiFi users to run their existing data flows on a managed, auto-scaling platform with a streamlined way to deploy NiFi data flows and a central monitoring dashboard making it easier than ever before to operate NiFi data flows at scale in the public cloud.

Cloud 121
article thumbnail

Towards a Reliable Device Management Platform

Netflix Tech

By Benson Ma , Alok Ahuja Introduction At Netflix, hundreds of different device types, from streaming sticks to smart TVs, are tested every day through automation to ensure that new software releases continue to deliver the quality of the Netflix experience that our customers enjoy. In addition, Netflix continuously works with its partners (such as Roku, Samsung, LG, Amazon) to port the Netflix SDK to their new and upcoming devices (TVs, smart boxes, etc), to ensure the quality bar is reached be

article thumbnail

Do Away With Data Integration Through A Dataware Architecture With Cinchy

Data Engineering Podcast

Summary The reason that so much time and energy is spent on data integration is because of how our applications are designed. By making the software be the owner of the data that it generates, we have to go through the trouble of extracting the information to then be used elsewhere. The team at Cinchy are working to bring about a new paradigm of software architecture that puts the data as the central element.

article thumbnail

Implementing a Pharma Data Mesh using DataOps

DataKitchen

Below is our fourth post (4 of 5) on combining data mesh with DataOps to foster innovation while addressing the challenges of a decentralized architecture. We’ve covered the basic ideas behind data mesh and some of the difficulties that must be managed. Below is a discussion of a data mesh implementation in the pharmaceutical space. For those embarking on the data mesh journey, it may be helpful to discuss a real-world example and the lessons learned from an actual data mesh implementation.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Designing and Architecting the Confluent CLI

Confluent

It is often difficult enough to build one application that talks to a single middleware or backend layer; e.g., a whole team of frontend engineers may build a web application […].

Designing 119
article thumbnail

Back to School! Time to Ditch the Promotions Calendar?

Teradata

As Back to School promotions hit the shelves, Christmas & New Year offers are already locked in. Are these long-lead cycles still effective in today’s dynamic Retail & CPG environment?

Retail 98
article thumbnail

Five Reasons Why Platforms Beat Point Solutions in Every Business Case

Cloudera

Once upon an IT time, everything was a “point product,” a specific application designed to do a single job inside a desktop PC, server, storage array, network, or mobile device. Point solutions are still used every day in many enterprise systems, but as IT continues to evolve, the platform approach beats point solutions in almost every use case.

Cloud 120
article thumbnail

Predictive Lead Scoring: Discovering Best-Fit Prospects with Machine Learning

AltexSoft

B2B sales strategies can be roughly divided into two activities: lead generation and lead conversion. It’s clear how each works. The former, attracting visitors to your website and then helping them take certain actions, is almost automated and works through carefully placed calls to action. The latter, supporting a lead to make the purchasing decision, is done by professional sales people with their arsenal of personalized tactics.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Decoupling Data Operations From Data Infrastructure Using Nexla

Data Engineering Podcast

Summary The technological and social ecosystem of data engineering and data management has been reaching a stage of maturity recently. As part of this stage in our collective journey the focus has been shifting toward operation and automation of the infrastructure and workflows that power our analytical workloads. It is an encouraging sign for the industry, but it is still a complex and challenging undertaking.

Data 100
article thumbnail

Accelerating Drug Discovery and Development with DataOps

DataKitchen

A drug company tests 50,000 molecules and spends a billion dollars or more to find a single safe and effective medicine that addresses a substantial market. Figure 1 shows the 15-year cycle from screening to government agency approval and phase IV trials. Drug companies desperately look for ways to compress this lengthy time frame and to demonstrate the competitive advantage of their intellectual property.

article thumbnail

Driving New Integrations with Confluent and ksqlDB at ACERTUS

Confluent

When companies need help with their vehicle fleets—including transport, storage, or renewing expired registrations—they don’t want to have to deal with multiple vehicle logistics providers. For these companies, ACERTUS provides […].

article thumbnail

Mitsui Sumitomo Insurance Co., Ltd.

Teradata

Vantage on AWS supports Next Best Action efforts - adding new supplemental coverage on policy renewals at a rate of 250%.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Replace and Boost your Apache Storm Topologies with Apache NiFi Flows

Cloudera

Recently, I worked with a large fortune 500 customer on their migration from Apache Storm to Apache NiFi. If you’re asking yourself, “Isn’t Storm for complex event processing and NiFi for simple event processing?”, you’re correct. A few customers chose a complex event engine like Apache Storm for their simple event processing, even when Apache NiFi is the more practical choice, cutting drastically down on SDLC (software development lifecycle) time.

Kafka 119
article thumbnail

Data Marts: What They Are and Why Businesses Need Them

AltexSoft

Imagine you run a candy store. Some sweets are presented on your display cases for quick access while the rest is kept in the storeroom. Now let’s think of sweets as the data required for your company’s daily operations. Instead of combing through the vast amounts of all organizational data stored in a data warehouse, you can use a data mart — a repository that makes specific pieces of data available quickly to any given business unit.

article thumbnail

Let Your Analysts Build A Data Lakehouse With Cuelake

Data Engineering Podcast

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. In this episode Vikrant Dubey discusses his work on the Cuelake project which allows data analysts to build a lakehouse with SQL queries.

Building 100
article thumbnail

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

Below is our third post (3 of 5) on combining data mesh with DataOps to foster greater innovation while addressing the challenges of a decentralized architecture. We’ve talked about data mesh in organizational terms (see our first post, “ What is a Data Mesh? ”) and how team structure supports agility. Let’s take a look at some technical aspects of data mesh so we can work our way towards a pharmaceutical industry application example. .

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Announcing the Confluent Q3 ’21 Release

Confluent

The Confluent Q3 ‘21 release is here and packed full of new features that enable the world’s most innovative businesses to continue building what keeps them on top: real-time, mission-critical […].

Building 105
article thumbnail

Chugai Pharmaceutical

Teradata

Accelerating drug discovery and development with Teradata Vantage on AWS.

article thumbnail

Choosing Your Upgrade or Migration Path to Cloudera Data Platform

Cloudera

In our previous blog, we talked about the four paths to Cloudera Data Platform. . In-place Upgrade. Sidecar Migration. Rolling Sidecar Migration. Migrating to Cloud. If you haven’t read that yet, we invite you to take a moment and run through the scenarios in that blog. The four strategies will be relevant throughout the rest of this discussion. Today, we’ll discuss an example of how you might make this decision for a cluster using a “round of elimination” process based on our decision workflow.

Finance 119
article thumbnail

Flight Price Predictor: Training Models to Pinpoint the Best Time for Booking

AltexSoft

Pricing in the airline industry is often compared to a brain game between carriers and passengers where each party pursues the best rates. Carriers aim at selling tickets as expensive as possible — while still not losing consumers to competitors. Passengers want to buy flights at the lowest cost — while not missing the chance to get on board. All this makes flight prices fluctuant and hard to predict.

article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.