article thumbnail

Cloudera Data Engineering 2021 Year End Review

Cloudera

New in 2021. Figure 2 – CDE product launch highlights in 2021. As data teams grow, RAZ integration with CDE will play an even more critical role in helping share and control curated datasets. Early on in 2021 we expanded our APIs to support pipelines using a new job type — Airflow. Modernizing pipelines.

article thumbnail

The DataOps Vendor Landscape, 2021

DataKitchen

Download the 2021 DataOps Vendor Landscape here. DataOps is a hot topic in 2021. Soda doesn’t just monitor datasets and send meaningful alerts to the relevant teams. The post The DataOps Vendor Landscape, 2021 first appeared on DataKitchen. Soda Data Monitoring — Soda tells you which data is worth fixing.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Data Engineering Podcast

What (if any) are the datasets or analyses that you are consciously not investing in supporting? The company was founded in 2021 by Kirk Marple after his tenure as CTO of Kespry. What (if any) are the datasets or analyses that you are consciously not investing in supporting?

Datasets 130
article thumbnail

Apache Ozone Powers Data Science in CDP Private Cloud

Cloudera

In this blog post, we will ingest a real world dataset into Ozone, create a Hive table on top of it and analyze the data to study the correlation between new vaccinations and new cases per country using a Spark ML Jupyter notebook in CML. On creation of the bucket, we also upload a COVID dataset [1] that is a CSV with about 100K rows.

article thumbnail

Celebrating Data Superheroes: The 2021 Data Impact Awards Winners

Cloudera

Given the way we have seen communities and workplace cultures come together and stand for change over what has been a disruptive 20 months, we are proud to introduce the People First category to the 2021 DIA. So, without further ado, it is with great delight that we officially publish the 2021 Data Impact Award winners!

Banking 77
article thumbnail

7 Top Open Source Datasets to Train Natural Language Processing (NLP) & Text Models

KDnuggets

With a lot of excitement and research around NLP, there are growing opportunities to apply these technologies to real-world scenarios. It's not trivial to become familiar with NLP and these open-source data sets can help you increase your skills.

Datasets 123
article thumbnail

How Skyscanner Enabled Data & AI Governance with Monte Carlo

Monte Carlo

After one particularly tough week in the winter of 2021, when marketing data was disrupted by daily incidents and downtime, a group of data engineers decided to create a full diagram of the data systems. The data teams were maintaining 30,000 datasets, and often found anomalies or issues that had gone unnoticed for months.