Sat.Dec 19, 2020 - Fri.Dec 25, 2020

article thumbnail

Low Friction Data Governance With Immuta

Data Engineering Podcast

Summary Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex aspects is that of access control to the data assets that an organization is responsible for managing. The team at Immuta has built a platform that aims to tackle that problem in a flexible and maintainable fashion so that data teams can easily integrate authorization, data masking, and privacy enhancing technologies into their data infrastructure.

article thumbnail

Evolving Container Security With Linux User Namespaces

Netflix Tech

By Fabio Kung , Sargun Dhillon , Andrew Spyker , Kyle , Rob Gulewich, Nabil Schear , Andrew Leung , Daniel Muino, and Manas Alekar As previously discussed on the Netflix Tech Blog, Titus is the Netflix container orchestration system. It runs a wide variety of workloads from various parts of the company?—?everything from the frontend API for netflix.com, to machine learning training workloads, to video encoders.

Media 118
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

What’s New in Apache Kafka 2.7.0

Confluent

I’m proud to announce the release of Apache Kafka 2.7.0 on behalf of the Apache Kafka® community. The 2.7.0 release contains many new features and improvements. This blog post highlights […].

Kafka 116
article thumbnail

An A-Z Data Adventure on Cloudera’s Data Platform

Cloudera

In this blog we will take you through a persona-based data adventure, with short demos attached, to show you the A-Z data worker workflow expedited and made easier through self-service, seamless integration, and cloud-native technologies. You will learn all the parts of Cloudera’s Data Platform that together will accelerate your everyday Data Worker tasks.

Banking 97
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

DataKitchen’s Best of 2020 DataOps Resources

DataKitchen

We understand that many folks would like to say goodbye and good riddance to 2020. But before we shut the door on such a turbulent, transformative year, we at DataKitchen would like to share the creme de la creme of our DataOps content in hopes that it can help you as you learn about and implement DataOps. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year.

IT 84
article thumbnail

Optimizing data warehouse storage

Netflix Tech

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse.

More Trending

article thumbnail

2020 Data Impact Award Winner Spotlight: West Midlands Police

Cloudera

Our annual Data Impact Awards are all about celebrating organizations that are unlocking the maximum value from their data in order to drive the business forward. One category that highlighted some fantastic examples of customers doing just that, was The Enterprise Data Cloud award. While data has become crucial in helping businesses weather the storm in the last few months, it’s also been more challenging to manage due to the speed and volume in which it’s produced.

Cloud 96
article thumbnail

Improve Business Agility by Hiring a DataOps Engineer

DataKitchen

It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change. – Leon C. Megginson on Charles Darwin “Origin of Species”. Adapt or face decline. The agile alliance defines “ business agility ” as the ability of an organization to sense changes internally or externally and respond accordingly in order to deliver value to its customers.

article thumbnail

The Modern Data Stack, Metadata Architectures, and More: Top 10 Links From Across the Web

Data Council

Here's our December 2020 roundup of links from across the web that could be relevant to you: 1. The Modern Data Stack (Fishtown Analytics) This long-form post on the dbt blog is a must-read. Titled “The Modern Data Stack: Past, Present, and Future,” it answers the question that Tristan Handy has been asking himself for the past two years: “What happened to the massive innovation we saw from 2012-2016?

article thumbnail

Debugging image dimensions with Next.js

Grouparoo

Yesterday, I was writing some blog posts. In grand engineer tradition, I got distracted while blogging and spent a few hours writing tools to increase blogging efficiency. Specifically, I was having trouble knowing the correct width and height props to put on the screenshots I was making for the blog post. I would take the screenshot and then use image tools and even a spreadsheet to figure out the right ratio/dimensions for how I wanted it to show up in the UI.

Coding 52
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

How ASEAN Retailers Can Become insight driven with a Hybrid Cloud data strategy

Cloudera

There has been an e-commerce explosion this year as consumers seek safety and convenience from the comfort of their own homes using digital tools to purchase everything from durable hard goods to fashion accessories to daily living consumables like food perishables, cleaning products and even school supplies. In a 2020 study by Facebook and Bain & Co , approximately 310 million customers in Southeast Asia (ASEAN) are expected to shop online with an average spend of US$172 this year, compared

Retail 93
article thumbnail

A Day in the Life of a Customer Success Manager

Teradata

Data is the lifeblood of organizations, bringing information to various departments and functions. Keeping data healthy can be simple with good hygiene, but this is easier said than done.

article thumbnail

How to Join Data in Elasticsearch vs Rockset

Rockset

Elasticsearch has long been used for a wide variety of real-time analytics use cases, including log storage and analysis and search applications. The reason it’s so popular is because of how it indexes data so it’s efficient for search. However, this comes with a cost in that joining documents is less efficient. There are ways to build relationships in Elasticsearch documents, most common are: nested objects, parent-child joins, and application side joins.

SQL 40
article thumbnail

Superset 0.38, CRUD Redesign, ECharts Improvements, and more.

Preset

What's new in Superset 0.38?

40
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Coffee with Cloudera Partners: Intel

Cloudera

Meet Angela Gill and Gina Mcfarland, the powerhouse duo leading the Intel/Cloudera relationship. Angela brings technical expertise, while Gina drives the business strategy. By bringing together Intel, a world leader in computing innovation, and Cloudera, the leader in enterprise analytic data management, Cloudera and Intel are able to accelerate the pace of innovation and deliver customer value on an industry-leading platform.

article thumbnail

Regulation as a Service: A Win-Win

Teradata

The banking industry has a regulatory problem. Both sides of the regulatory fence are now working on a more holistic, rules-based system that, if done well, could become a competitive advantage.

Banking 52