July, 2020

article thumbnail

Ensuring Data Quality, With Great Expectations

Start Data Engineering

What is data quality As the name suggest, it refers to the quality of our data. Quality should be defined based on your project requirements. It can be as simple as ensuring a certain column has only the allowed values present or falls within a given range of values to more complex cases like, when a certain column must match a specific regex pattern, fall within a standard deviation range, etc.

Data 130
article thumbnail

Introducing Domain-Oriented Microservice Architecture

Uber Engineering

Introduction. Recently there has been substantial discussion around the downsides of service oriented architectures and microservice architectures in particular. While only a few years ago, many people readily adopted microservice architectures due to the numerous benefits they provide such as … The post Introducing Domain-Oriented Microservice Architecture appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Apache Kafka Native MQTT at Scale with Confluent Cloud and Waterstream

Confluent

With billions of Internet of Things (IoT) devices, achieving real-time interoperability has become a major challenge. Together, Confluent, Waterstream, and MQTT are accelerating Industry 4.0 with new Industrial IoT (IIoT) […].

Kafka 139
article thumbnail

Doing Good with Data: Teradata's COVID-19 Resiliency Dashboard

Teradata

To help our customers navigate the world's new normal, our teams have created a business-centric, execution-focused tool – we call it the Resiliency Dashboard.

Data 142
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Build More Reliable Distributed Systems By Breaking Them With Jepsen

Data Engineering Podcast

Summary A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of subtle ways that errors can creep in. Kyle Kingsbury created the Jepsen framework for testing the guarantees of distributed data processing systems and identifying when and why they break. In this episode he shares his approach to testing complex systems, the common challenges that are faced by engineers who build them, and why it is important to und

Systems 100
article thumbnail

Machine Learning for a Better Developer Experience

Netflix Tech

Stanislav Kirdey , William High Imagine having to go through 2.5GB of log entries from a failed software build?—?3 million lines?—?to search for a bug or a regression that happened on line 1M. It’s probably not even doable manually! However, one smart approach to make it tractable might be to diff the lines against a recent successful build, with the hope that the bug produces unusual lines in the logs.

More Trending

article thumbnail

Data Pipelines in the Healthcare Industry

DareData

The Challenges of Medical Data In recent times, there have been several developments in applications of machine learning to the medical industry. We have heard news of machine learning systems outperforming seasoned physicians on diagnosis accuracy, chatbots that present recommendations depending on your symptoms , or algorithms that can identify body parts from transversal image slices , just to name a few.

article thumbnail

Putting Several Event Types in the Same Topic – Revisited

Confluent

In the article Should You Put Several Event Types in the Same Kafka Topic?, Martin Kleppmann discusses when to combine several event types in the same topic and introduces new […].

Kafka 138
article thumbnail

Return on Data – The New Valuation for Future Retail

Teradata

Today’s retailers face an abundance of data scattered across their organizations. The way forward is as much about having a strategic approach to data as it is about technology.

Retail 121
article thumbnail

Making Wind Energy More Efficient With Data At Turbit Systems

Data Engineering Podcast

Summary Wind energy is an important component of an ecologically friendly power system, but there are a number of variables that can affect the overall efficiency of the turbines. Michael Tegtmeier founded Turbit Systems to help operators of wind farms identify and correct problems that contribute to suboptimal power outputs. In this episode he shares the story of how he got started working with wind energy, the system that he has built to collect data from the individual turbines, and how he is

Systems 100
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Byte Down: Making Netflix’s Data Infrastructure Cost-Effective

Netflix Tech

By Torio Risianto, Bhargavi Reddy, Tanvi Sahni, Andrew Park Continue reading on Netflix TechBlog ».

Bytes 96
article thumbnail

Stitch S3 DB Integration

Start Data Engineering

Given Source S3 path and file delimiter data warehouse connection details (endpoint, port, username, password and database name) data warehouse schema name and table name Run frequency Steps Log into your stitch account, here Click on the Destination tab and use the data warehouse connection details to establish a destination database. Click on Add Integration button on your dashboard.

article thumbnail

Improving MongoDB Read Performance - Indexing, Replication and Sharding

Rockset

Read performance is crucial for databases. If it takes too long to read a record from a database, this can stall the request for data from the client application, which could result in unexpected behavior and adversely impact user experience. For these reasons, the read operation on your database should last no more than a fraction of a second. There are a number of ways to improve database read performance, though not all of these methods will work for every type of application.

MongoDB 52
article thumbnail

Top 5 Reasons to Attend Kafka Summit Virtually

Confluent

The first-ever virtual Kafka Summit 2020 kicks off next month in the comfort of your home office, couch, spare bedroom, living room, outbuilding, lanai, veranda, or in-home portico, featuring an […].

Kafka 137
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

The Importance of Data in UX Design

Teradata

The days are gone when defining a user experience was limited to the choice of designers. Now data plays a more important role in the design process than ever before.

article thumbnail

Open Source Production Grade Data Integration With Meltano

Data Engineering Podcast

Summary The first stage of every data pipeline is extracting the information from source systems. There are a number of platforms for managing data integration, but there is a notable lack of a robust and easy to use open source option. The Meltano project is aiming to provide a solution to that situation. In this episode, project lead Douwe Maan shares the history of how Meltano got started, the motivation for the recent shift in focus, and how it is implemented.

article thumbnail

Empowering the Visual Effects Community with the NetFX Platform

Netflix Tech

The cloud-based platform allows vendors, artists and creators to connect and collaborate on visual effects (VFX) from anywhere in the… Continue reading on Netflix TechBlog ».

Cloud 75
article thumbnail

Designing a "low-effort" ELT system, using stitch and dbt

Start Data Engineering

Intro A very common use case in data engineering is to build a ETL system for a data warehouse, to have data loaded in from multiple separate databases to enable data analysts/scientists to be able to run queries on this data, since the source databases are used by your applications and we do not want these analytic queries to affect our application performance and the source data is disconnected as shown below.

Systems 130
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Sharing Code in Next.JS Apps with Plugins

Grouparoo

At Grouparoo, our front-end website is built using React and Next.js. Next.js is an excellent tool made by Vercel that handles all the hard parts of making a React app for you - Routing, Server-side Rendering, Page Hydration and more. It includes a simple starting place to build your routes and pages, based on the file system. If you want a /about page, just make an /pages/about.tsx file!

Coding 52
article thumbnail

I’ve Got the Key, I’ve Got the Secret. Here’s How Keys Work in ksqlDB 0.10.

Confluent

ksqlDB 0.10 includes significant changes and improvements to how keys are handled. This is part of a series of enhancements that began with support for non-VARCHAR keys and will ultimately […].

Process 122
article thumbnail

That Lockdown Feeling

Teradata

Don't impose an unnecessary lockdown on your data consumers by choosing the wrong data analytics platform. Choose Teradata Vantage to set them free. Read more.

article thumbnail

DataOps For Streaming Systems With Lenses.io

Data Engineering Podcast

Summary There are an increasing number of use cases for real time data, and the systems to power them are becoming more mature. Once you have a streaming platform up and running you need a way to keep an eye on it, including observability, discovery, and governance of your data. That’s what the Lenses.io DataOps platform is built for. In this episode CTO Andrew Stevenson discusses the challenges that arise from building decoupled systems, the benefits of using SQL as the common interface f

Systems 100
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Unbundling Data Science Workflows with Metaflow and AWS Step Functions

Netflix Tech

by David Berg, Ravi Kiran Chirravuri, Romain Cledat, Jason Ge, Savin Goyal, Ferras Hamad, Ville Tuulos Continue reading on Netflix TechBlog ».

AWS 60
article thumbnail

Stitch Database to data warehouse Integration

Start Data Engineering

Given Source database connection details (endpoint, port, username, password and database name) Source table to replicate destination schema name run frequency can be set to 10min We are assuming the destination data warehouse is already setup in stitch. Steps Log into your stitch account. here Click on Add Integration button on your dashboard. Choose PostgreSQL option as the integration in the next page.

article thumbnail

Indexing on MongoDB Using Rockset - How It Works

Rockset

MongoDB is the most popular NoSQL database today, by some measures, even taking on traditional SQL databases like MySQL, which have been the de facto standard for many years. MongoDB’s document model and flexible schemas allow for rapid iteration in applications. MongoDB is designed to scale out to massive datasets and workloads, so developers know they will not be limited by their database.

MongoDB 52
article thumbnail

Project Metamorphosis Month 3: Infinite Storage in Confluent Cloud for Apache Kafka

Confluent

This is the third month of Project Metamorphosis, where we discuss new features in Confluent’s offerings that bring together event streams and the best characteristics of modern cloud data systems. […].

Project 122
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Advancing the Telecom Industry through Network Experience Analytics

Teradata

For today's Telco providers, new products & services are all driven by the end consumer's experience. That's where Teradata's Network Experience Analytics comes to play.

76
article thumbnail

Talking to customers: A 101 for Product Managers

Afterpay Tech

By James Taylor “Do you like this?” “Would you use this?” If you’re a product manager and you’ve been asking customers these questions, keep reading. We all know how important it is to talk to your customers, but it's only half the story. Asking them the right questions could mean the difference between a successful product and a tough lesson best told through a blog post.

article thumbnail

On FMCG and Retail Forecasting

DareData

The importance of having accurate sales forecasts When we think of sales forecasting, a couple of interesting questions unfold: how do most businesses manage their operations without having an accurate view of the future? With strategic decisions having a long-term impact on business, are businesses using all the data they could? Just like a driver can drive during a really foggy day with minimal lights on or an investor can invest in some asset without understanding the future parameters that w

Retail 52
article thumbnail

Use Akka Streams' Graph DSL: Quickly Explained

Rock the JVM

Explore Akka Streams' powerful Graph DSL and learn how to get started quickly with our easy guide

52
article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.