Remove Blog Remove Datasets Remove Metadata
article thumbnail

Open-Sourcing AvroTensorDataset: A Performant TensorFlow Dataset For Processing Avro Data

LinkedIn Engineering

To remove this bottleneck, we built AvroTensorDataset , a TensorFlow dataset for reading, parsing, and processing Avro data. Today, we’re excited to open source this tool so that other Avro and Tensorflow users can use this dataset in their machine learning pipelines to get a large performance boost to their training workloads.

Datasets 102
article thumbnail

NVIDIA RAPIDS in Cloudera Machine Learning

Cloudera

In the previous blog post in this series, we walked through the steps for leveraging Deep Learning in your Cloudera Machine Learning (CML) projects. To try and predict this, an extensive dataset including anonymised details on the individual loanee and their historical credit history are included. Get the Dataset. Introduction.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

The data architecture layer is one such area where growing datasets have pushed the limits of scalability and performance. The data explosion has to be met with new solutions, that’s why we are excited to introduce the next generation table format for large scale analytic datasets within Cloudera Data Platform (CDP) – Apache Iceberg.

Metadata 111
article thumbnail

A Look At The Data Systems Behind The Gameplay For League Of Legends

Data Engineering Podcast

Summary The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal business users accessing an environment controlled by the business. Atlan is the metadata hub for your data ecosystem. Atlan is the metadata hub for your data ecosystem.

Systems 130
article thumbnail

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

This is part of our series of blog posts on recent enhancements to Impala. Apache Impala is synonymous with high-performance processing of extremely large datasets, but what if our data isn’t huge? So clearly Impala is used extensively with datasets both small and large. Metadata Caching. More on this below.

Metadata 143
article thumbnail

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

mock Generate or validate mock datasets. All the above commands are very likely to be described in separate future blog posts, but right now let’s focus on the dataflow sample command. " ) COMMENT "Example dataset brought to you by Dataflow. -v, --verbose Enables verbose mode. version Show the version and exit.

article thumbnail

Data governance beyond SDX: Adding third party assets to Apache Atlas

Cloudera

In this blog, we’ll highlight the key CDP aspects that provide data governance and lineage and show how they can be extended to incorporate metadata for non-CDP systems from across the enterprise. Extending Atlas’ metadata model. The example 1_typedef-server.json describes the server typedef used in this blog. .