Sat.Jul 25, 2020 - Fri.Jul 31, 2020

article thumbnail

Ensuring Data Quality, With Great Expectations

Start Data Engineering

What is data quality As the name suggest, it refers to the quality of our data. Quality should be defined based on your project requirements. It can be as simple as ensuring a certain column has only the allowed values present or falls within a given range of values to more complex cases like, when a certain column must match a specific regex pattern, fall within a standard deviation range, etc.

Data 130
article thumbnail

I’ve Got the Key, I’ve Got the Secret. Here’s How Keys Work in ksqlDB 0.10.

Confluent

ksqlDB 0.10 includes significant changes and improvements to how keys are handled. This is part of a series of enhancements that began with support for non-VARCHAR keys and will ultimately […].

Process 122
article thumbnail

Build More Reliable Distributed Systems By Breaking Them With Jepsen

Data Engineering Podcast

Summary A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of subtle ways that errors can creep in. Kyle Kingsbury created the Jepsen framework for testing the guarantees of distributed data processing systems and identifying when and why they break. In this episode he shares his approach to testing complex systems, the common challenges that are faced by engineers who build them, and why it is important to und

Systems 100
article thumbnail

Advancing the Telecom Industry through Network Experience Analytics

Teradata

For today's Telco providers, new products & services are all driven by the end consumer's experience. That's where Teradata's Network Experience Analytics comes to play.

76
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Unbundling Data Science Workflows with Metaflow and AWS Step Functions

Netflix Tech

by David Berg, Ravi Kiran Chirravuri, Romain Cledat, Jason Ge, Savin Goyal, Ferras Hamad, Ville Tuulos Continue reading on Netflix TechBlog ».

AWS 61
article thumbnail

Analysing Changes with Debezium and Kafka Streams

Confluent

Change Data Capture (CDC) is an excellent way to introduce streaming analytics into your existing database, and using Debezium enables you to send your change data through Apache Kafka®. Although […].

Kafka 105

More Trending

article thumbnail

Data Pipelines in the Healthcare Industry

DareData

The Challenges of Medical Data In recent times, there have been several developments in applications of machine learning to the medical industry. We have heard news of machine learning systems outperforming seasoned physicians on diagnosis accuracy, chatbots that present recommendations depending on your symptoms , or algorithms that can identify body parts from transversal image slices , just to name a few.

article thumbnail

Streaming Data Into Teradata Vantage Using Amazon Managed Kafka (MSK) Data Streams and AWS Glue Streaming ETL

Teradata

In this post, we provide step-by-step instructions on how to set up Vantage & author AWS Glue Streaming ETL jobs to stream data into Vantage from Amazon MSK and visualize the data.

AWS 52
article thumbnail

How PushOwl Uses ksqlDB to Scale Their Analytics and Reporting Use Cases

Confluent

Using a declarative SQL-like interface, ksqlDB makes it easy to integrate event streaming applications into any tech stack. This article illustrates how ksqlDB was added to PushOwl’s Python tech stack, […].

SQL 99
article thumbnail

The Differences Between Null, Nothing, Nil, None, and Unit in Scala

Rock the JVM

Discover the different flavors of 'nothing-ness' in Scala and how they impact your code

Scala 52
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How To Build A Live-Updating COVID Dashboard Using Google Sheets and Apache Superset

Preset

The powerful combination of Google Sheets and Apache Superset

article thumbnail

Why Teradata Has Never Been Afraid of High Demand

Teradata

Teradata's Advanced SQL Engine can take on more work concurrently than competitors, all while continuing to deliver high throughput under high stress. Learn more.

SQL 52
article thumbnail

Performance Isolation for Your Primary MongoDB Cluster

Rockset

Database performance is a critical aspect of ensuring a web application or service remains fast and stable. As the service scales up, there are often challenges with scaling the primary database along with it. While MongoDB is often used as a primary online database and can meet the demands of very large scale web applications, it does often become the bottleneck as well.

MongoDB 40
article thumbnail

Quick Reports: Xero to Power BI

FreshBI

The objective of this blog To give you the tools and the skills to connect to Xero Accounting from the Power BI Desktop and to have immediate access to the categorized data that drives each of the built-in reports in Xero. What you need to get started To get quick immediate access to the data that drives the Xero Reports and push them into Power BI, you’ll need 3 tools : Power BI Desktop : Download here>> ‘Quick Reports’ Power BI Custom Connector for Xero AND Power BI Quick Reports Templ

BI 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

What is a Data Mesh — and How Not to Mesh it Up

Monte Carlo

Updated: January 2023. Ask anyone in the data industry what’s hot these days and chances are “data mesh” will rise to the top of the list. But what is a data mesh and why should you build one? Inquiring minds want to know. In the age of self-service business intelligence , nearly every company considers themselves a data-first company, but not every company is treating their data architecture with the level of democratization and scalability it deserves.

IT 45