Sat.May 25, 2019 - Fri.May 31, 2019

article thumbnail

Data Lineage For Your Pipelines

Data Engineering Podcast

Summary Some problems in data are well defined and benefit from a ready-made set of tools. For everything else, there’s Pachyderm, the platform for data science that is built to scale. In this episode Joe Doliner, CEO and co-founder, explains how Pachyderm started as an attempt to make data provenance easier to track, how the platform is architected and used today, and examples of how the underlying principles manifest in the workflows of data engineers and data scientists as they collabor

article thumbnail

3 Easy Ways to Turn Data into Actionable Answers

Teradata

Rob Armstrong explains three critical ways to get better answers from your data.

Data 99
article thumbnail

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

Confluent

In part 1 , we discussed an event streaming architecture that we implemented for a customer using Apache Kafka ® , KSQL from Confluent, and Kafka Streams. Now in part 2, we’ll discuss the challenges we faced developing, building, and deploying the KSQL portion of our application and how we used Gradle to address them. In part 3, we’ll explore using Gradle to build and deploy KSQL user-defined functions (UDFs) and Kafka Streams microservices.

Kafka 96
article thumbnail

Making our Android Studio Apps Reactive with UI Components & Redux

Netflix Tech

By Juliano Moraes , David Henry , Corey Grunewald & Jim Isaacs Recently Netflix has started building mobile apps to bring technology and innovation to our Studio Physical Productions , the portion of the business responsible for producing our TV shows and movies. Our very first mobile app is called Prodicle and was built for Android & iOS using the same reactive architecture in both platforms, which allowed us to build 2 apps from scratch in 3 months with 4 software engineers.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Cloudera Data Science Workbench: where innovation meets security, compliance and scale on the road to industrialized AI

Cloudera

Gartner states that “By 2022, 75% of new end-user solutions leveraging machine learning (ML) and AI techniques will be built with commercial instead of open source platforms” ¹. Spoiler alert: it’s not because data scientists will stop relying on open source for the latest innovation in ML algorithms and development environments. But rather as businesses look to operationalize machine learning capabilities at scale, they’ll turn increasingly to commercial platforms, with connectors to open so

article thumbnail

How to Drive Marketing Personalization in an Increasingly Non-Personal World

Teradata

Tom Casey discusses marketing personalization and why it's important to the modern customer experience.

IT 91

More Trending

article thumbnail

Using Tableau for Live Dashboards on Event Data

Rockset

Live dashboards can help organizations make sense of their event data and understand what's happening in their businesses in real time. Marketing managers constantly want to know how many signups there were in the last hour, day, or week. Product managers are always looking to understand which product features are working well and most heavily utilized.

BI 40
article thumbnail

How we release open source projects

Zalando Engineering

This blog post describes how we manage the process of proposing, reviewing and approving projects to become open source, while at the same time ensuring project code follows our compliance rules, and the maintainers of the projects are aware of their responsibilities. See our formal release guidelines Overview The process involves five steps that take the project from internal source code, through a review phase to our incubator, which eventually results in the project being graduated into our t

Project 40
article thumbnail

17 Ways to Mess Up Self-Managed Schema Registry

Confluent

Part 1 of this blog series by Gwen Shapira explained the benefits of schemas, contracts between services, and compatibility checking for schema evolution. In particular, using Confluent Schema Registry makes this really easy for developers to use schemas, and it is designed to be highly available. But it’s important to configure it properly from the start and manage it well, or else the schemas may not be available to the applications that need them.