Sat.Dec 19, 2020 - Fri.Dec 25, 2020

article thumbnail

Low Friction Data Governance With Immuta

Data Engineering Podcast

Summary Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex aspects is that of access control to the data assets that an organization is responsible for managing. The team at Immuta has built a platform that aims to tackle that problem in a flexible and maintainable fashion so that data teams can easily integrate authorization, data masking, and privacy enhancing technologies into their data infrastructure.

article thumbnail

Evolving Container Security With Linux User Namespaces

Netflix Tech

By Fabio Kung , Sargun Dhillon , Andrew Spyker , Kyle , Rob Gulewich, Nabil Schear , Andrew Leung , Daniel Muino, and Manas Alekar As previously discussed on the Netflix Tech Blog, Titus is the Netflix container orchestration system. It runs a wide variety of workloads from various parts of the company?—?everything from the frontend API for netflix.com, to machine learning training workloads, to video encoders.

Media 119
article thumbnail

What’s New in Apache Kafka 2.7.0

Confluent

I’m proud to announce the release of Apache Kafka 2.7.0 on behalf of the Apache Kafka® community. The 2.7.0 release contains many new features and improvements. This blog post highlights […].

Kafka 116
article thumbnail

2020 Data Impact Award Winner Spotlight: West Midlands Police

Cloudera

Our annual Data Impact Awards are all about celebrating organizations that are unlocking the maximum value from their data in order to drive the business forward. One category that highlighted some fantastic examples of customers doing just that, was The Enterprise Data Cloud award. While data has become crucial in helping businesses weather the storm in the last few months, it’s also been more challenging to manage due to the speed and volume in which it’s produced.

Cloud 101
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

DataKitchen’s Best of 2020 DataOps Resources

DataKitchen

We understand that many folks would like to say goodbye and good riddance to 2020. But before we shut the door on such a turbulent, transformative year, we at DataKitchen would like to share the creme de la creme of our DataOps content in hopes that it can help you as you learn about and implement DataOps. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year.

IT 84
article thumbnail

Optimizing data warehouse storage

Netflix Tech

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse.

More Trending

article thumbnail

How ASEAN Retailers Can Become insight driven with a Hybrid Cloud data strategy

Cloudera

There has been an e-commerce explosion this year as consumers seek safety and convenience from the comfort of their own homes using digital tools to purchase everything from durable hard goods to fashion accessories to daily living consumables like food perishables, cleaning products and even school supplies. In a 2020 study by Facebook and Bain & Co , approximately 310 million customers in Southeast Asia (ASEAN) are expected to shop online with an average spend of US$172 this year, compared

Retail 98
article thumbnail

Improve Business Agility by Hiring a DataOps Engineer

DataKitchen

It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change. – Leon C. Megginson on Charles Darwin “Origin of Species”. Adapt or face decline. The agile alliance defines “ business agility ” as the ability of an organization to sense changes internally or externally and respond accordingly in order to deliver value to its customers.

article thumbnail

The Modern Data Stack, Metadata Architectures, and More: Top 10 Links From Across the Web

Data Council

Here's our December 2020 roundup of links from across the web that could be relevant to you: 1. The Modern Data Stack (Fishtown Analytics) This long-form post on the dbt blog is a must-read. Titled “The Modern Data Stack: Past, Present, and Future,” it answers the question that Tristan Handy has been asking himself for the past two years: “What happened to the massive innovation we saw from 2012-2016?

article thumbnail

Debugging image dimensions with Next.js

Grouparoo

Yesterday, I was writing some blog posts. In grand engineer tradition, I got distracted while blogging and spent a few hours writing tools to increase blogging efficiency. Specifically, I was having trouble knowing the correct width and height props to put on the screenshots I was making for the blog post. I would take the screenshot and then use image tools and even a spreadsheet to figure out the right ratio/dimensions for how I wanted it to show up in the UI.

Coding 52
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

An A-Z Data Adventure on Cloudera’s Data Platform

Cloudera

In this blog we will take you through a persona-based data adventure, with short demos attached, to show you the A-Z data worker workflow expedited and made easier through self-service, seamless integration, and cloud-native technologies. You will learn all the parts of Cloudera’s Data Platform that together will accelerate your everyday Data Worker tasks.

Banking 98
article thumbnail

A Day in the Life of a Customer Success Manager

Teradata

Data is the lifeblood of organizations, bringing information to various departments and functions. Keeping data healthy can be simple with good hygiene, but this is easier said than done.

article thumbnail

How to Join Data in Elasticsearch vs Rockset

Rockset

Elasticsearch has long been used for a wide variety of real-time analytics use cases, including log storage and analysis and search applications. The reason it’s so popular is because of how it indexes data so it’s efficient for search. However, this comes with a cost in that joining documents is less efficient. There are ways to build relationships in Elasticsearch documents, most common are: nested objects, parent-child joins, and application side joins.

SQL 40
article thumbnail

Superset 0.38, CRUD Redesign, ECharts Improvements, and more.

Preset

What's new in Superset 0.38?

40
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Coffee with Cloudera Partners: Intel

Cloudera

Meet Angela Gill and Gina Mcfarland, the powerhouse duo leading the Intel/Cloudera relationship. Angela brings technical expertise, while Gina drives the business strategy. By bringing together Intel, a world leader in computing innovation, and Cloudera, the leader in enterprise analytic data management, Cloudera and Intel are able to accelerate the pace of innovation and deliver customer value on an industry-leading platform.

article thumbnail

Regulation as a Service: A Win-Win

Teradata

The banking industry has a regulatory problem. Both sides of the regulatory fence are now working on a more holistic, rules-based system that, if done well, could become a competitive advantage.

Banking 52