July, 2020

article thumbnail

Introducing Domain-Oriented Microservice Architecture

Uber Engineering

Introduction. Recently there has been substantial discussion around the downsides of service oriented architectures and microservice architectures in particular. While only a few years ago, many people readily adopted microservice architectures due to the numerous benefits they provide such as … The post Introducing Domain-Oriented Microservice Architecture appeared first on Uber Engineering Blog.

article thumbnail

Doing Good with Data: Teradata's COVID-19 Resiliency Dashboard

Teradata

To help our customers navigate the world's new normal, our teams have created a business-centric, execution-focused tool – we call it the Resiliency Dashboard.

Data 142
article thumbnail

Apache Kafka Native MQTT at Scale with Confluent Cloud and Waterstream

Confluent

With billions of Internet of Things (IoT) devices, achieving real-time interoperability has become a major challenge. Together, Confluent, Waterstream, and MQTT are accelerating Industry 4.0 with new Industrial IoT (IIoT) […].

Kafka 139
article thumbnail

Ensuring Data Quality, With Great Expectations

Start Data Engineering

What is data quality As the name suggest, it refers to the quality of our data. Quality should be defined based on your project requirements. It can be as simple as ensuring a certain column has only the allowed values present or falls within a given range of values to more complex cases like, when a certain column must match a specific regex pattern, fall within a standard deviation range, etc.

Data 130
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Build More Reliable Distributed Systems By Breaking Them With Jepsen

Data Engineering Podcast

Summary A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of subtle ways that errors can creep in. Kyle Kingsbury created the Jepsen framework for testing the guarantees of distributed data processing systems and identifying when and why they break. In this episode he shares his approach to testing complex systems, the common challenges that are faced by engineers who build them, and why it is important to und

Systems 100
article thumbnail

Byte Down: Making Netflix’s Data Infrastructure Cost-Effective

Netflix Tech

By Torio Risianto, Bhargavi Reddy, Tanvi Sahni, Andrew Park Continue reading on Netflix TechBlog ».

Bytes 98

More Trending

article thumbnail

Return on Data – The New Valuation for Future Retail

Teradata

Today’s retailers face an abundance of data scattered across their organizations. The way forward is as much about having a strategic approach to data as it is about technology.

Retail 121
article thumbnail

Putting Several Event Types in the Same Topic – Revisited

Confluent

In the article Should You Put Several Event Types in the Same Kafka Topic?, Martin Kleppmann discusses when to combine several event types in the same topic and introduces new […].

Kafka 138
article thumbnail

AWS RDS PostgreSQL Setup

Start Data Engineering

RDS AWS RDS is a managed service provided by AWS to run a relational database. We will see how to setup a postgres instance using AWS RDS. Log in to your AWS account. Go to Services -> RDS Click on Create Database, In the Create Database prompt, choose Standard Create option with PostgreSQL as engine type. In the Template section choose Free Tier and type in a DB Identifier, Master username and Master password.

article thumbnail

Making Wind Energy More Efficient With Data At Turbit Systems

Data Engineering Podcast

Summary Wind energy is an important component of an ecologically friendly power system, but there are a number of variables that can affect the overall efficiency of the turbines. Michael Tegtmeier founded Turbit Systems to help operators of wind farms identify and correct problems that contribute to suboptimal power outputs. In this episode he shares the story of how he got started working with wind energy, the system that he has built to collect data from the individual turbines, and how he is

Systems 100
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Empowering the Visual Effects Community with the NetFX Platform

Netflix Tech

The cloud-based platform allows vendors, artists and creators to connect and collaborate on visual effects (VFX) from anywhere in the… Continue reading on Netflix TechBlog ».

Cloud 75
article thumbnail

The Differences Between Null, Nothing, Nil, None, and Unit in Scala

Rock the JVM

Discover the different flavors of 'nothing-ness' in Scala and how they impact your code

Scala 52
article thumbnail

The Importance of Data in UX Design

Teradata

The days are gone when defining a user experience was limited to the choice of designers. Now data plays a more important role in the design process than ever before.

article thumbnail

Top 5 Reasons to Attend Kafka Summit Virtually

Confluent

The first-ever virtual Kafka Summit 2020 kicks off next month in the comfort of your home office, couch, spare bedroom, living room, outbuilding, lanai, veranda, or in-home portico, featuring an […].

Kafka 137
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Stitch S3 DB Integration

Start Data Engineering

Given Source S3 path and file delimiter data warehouse connection details (endpoint, port, username, password and database name) data warehouse schema name and table name Run frequency Steps Log into your stitch account, here Click on the Destination tab and use the data warehouse connection details to establish a destination database. Click on Add Integration button on your dashboard.

article thumbnail

Open Source Production Grade Data Integration With Meltano

Data Engineering Podcast

Summary The first stage of every data pipeline is extracting the information from source systems. There are a number of platforms for managing data integration, but there is a notable lack of a robust and easy to use open source option. The Meltano project is aiming to provide a solution to that situation. In this episode, project lead Douwe Maan shares the history of how Meltano got started, the motivation for the recent shift in focus, and how it is implemented.

article thumbnail

Machine Learning for a Better Developer Experience

Netflix Tech

Stanislav Kirdey , William High Imagine having to go through 2.5GB of log entries from a failed software build?—?3 million lines?—?to search for a bug or a regression that happened on line 1M. It’s probably not even doable manually! However, one smart approach to make it tractable might be to diff the lines against a recent successful build, with the hope that the bug produces unusual lines in the logs.

article thumbnail

Data Pipelines in the Healthcare Industry

DareData

The Challenges of Medical Data In recent times, there have been several developments in applications of machine learning to the medical industry. We have heard news of machine learning systems outperforming seasoned physicians on diagnosis accuracy, chatbots that present recommendations depending on your symptoms , or algorithms that can identify body parts from transversal image slices , just to name a few.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

That Lockdown Feeling

Teradata

Don't impose an unnecessary lockdown on your data consumers by choosing the wrong data analytics platform. Choose Teradata Vantage to set them free. Read more.

article thumbnail

I’ve Got the Key, I’ve Got the Secret. Here’s How Keys Work in ksqlDB 0.10.

Confluent

ksqlDB 0.10 includes significant changes and improvements to how keys are handled. This is part of a series of enhancements that began with support for non-VARCHAR keys and will ultimately […].

Process 122
article thumbnail

Designing a "low-effort" ELT system, using stitch and dbt

Start Data Engineering

Intro A very common use case in data engineering is to build a ETL system for a data warehouse, to have data loaded in from multiple separate databases to enable data analysts/scientists to be able to run queries on this data, since the source databases are used by your applications and we do not want these analytic queries to affect our application performance and the source data is disconnected as shown below.

Systems 130
article thumbnail

DataOps For Streaming Systems With Lenses.io

Data Engineering Podcast

Summary There are an increasing number of use cases for real time data, and the systems to power them are becoming more mature. Once you have a streaming platform up and running you need a way to keep an eye on it, including observability, discovery, and governance of your data. That’s what the Lenses.io DataOps platform is built for. In this episode CTO Andrew Stevenson discusses the challenges that arise from building decoupled systems, the benefits of using SQL as the common interface f

Systems 100
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Unbundling Data Science Workflows with Metaflow and AWS Step Functions

Netflix Tech

by David Berg, Ravi Kiran Chirravuri, Romain Cledat, Jason Ge, Savin Goyal, Ferras Hamad, Ville Tuulos Continue reading on Netflix TechBlog ».

AWS 61
article thumbnail

How To Build A Live-Updating COVID Dashboard Using Google Sheets and Apache Superset

Preset

The powerful combination of Google Sheets and Apache Superset

article thumbnail

Advancing the Telecom Industry through Network Experience Analytics

Teradata

For today's Telco providers, new products & services are all driven by the end consumer's experience. That's where Teradata's Network Experience Analytics comes to play.

76
article thumbnail

Project Metamorphosis Month 3: Infinite Storage in Confluent Cloud for Apache Kafka

Confluent

This is the third month of Project Metamorphosis, where we discuss new features in Confluent’s offerings that bring together event streams and the best characteristics of modern cloud data systems. […].

Project 122
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Stitch Database to data warehouse Integration

Start Data Engineering

Given Source database connection details (endpoint, port, username, password and database name) Source table to replicate destination schema name run frequency can be set to 10min We are assuming the destination data warehouse is already setup in stitch. Steps Log into your stitch account. here Click on Add Integration button on your dashboard. Choose PostgreSQL option as the integration in the next page.

article thumbnail

Improving MongoDB Read Performance - Indexing, Replication and Sharding

Rockset

Read performance is crucial for databases. If it takes too long to read a record from a database, this can stall the request for data from the client application, which could result in unexpected behavior and adversely impact user experience. For these reasons, the read operation on your database should last no more than a fraction of a second. There are a number of ways to improve database read performance, though not all of these methods will work for every type of application.

MongoDB 52
article thumbnail

Sharing Code in Next.JS Apps with Plugins

Grouparoo

At Grouparoo, our front-end website is built using React and Next.js. Next.js is an excellent tool made by Vercel that handles all the hard parts of making a React app for you - Routing, Server-side Rendering, Page Hydration and more. It includes a simple starting place to build your routes and pages, based on the file system. If you want a /about page, just make an /pages/about.tsx file!

Coding 52
article thumbnail

Use Akka Streams' Graph DSL: Quickly Explained

Rock the JVM

Explore Akka Streams' powerful Graph DSL and learn how to get started quickly with our easy guide

52
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.