December, 2020

article thumbnail

12 Days of Apache Kafka

Confluent

Before you say it: Yes, we are right now three days past Christmas, but technically the 12 days of Christmas refer to the days between Christmas and Epiphany, which is—I […].

Kafka 145
article thumbnail

Life of a Netflix Partner Engineer?—?The case of extra 40 ms

Netflix Tech

Life of a Netflix Partner Engineer?—?The case of the extra 40 ms By: John Blair , Netflix Partner Engineering The Netflix application runs on hundreds of smart TVs, streaming sticks and pay TV set top boxes. The role of a Partner Engineer at Netflix is to help device manufacturers launch the Netflix application on their devices. In this article we talk about one particularly difficult issue that blocked the launch of a device in Europe.

Bytes 143
article thumbnail

Off The Shelf Data Governance With Satori

Data Engineering Podcast

Summary One of the core responsibilities of data engineers is to manage the security of the information that they process. The team at Satori has a background in cybersecurity and they are using the lessons that they learned in that field to address the challenge of access control and auditing for data governance. In this episode co-founder and CTO Yoav Cohen explains how the Satori platform provides a proxy layer for your data, the challenges of managing security across disparate storage system

article thumbnail

Is Data Engineering a must for Data Scientists?

Team Data Science

Organizations in several industries such as banking, healthcare, and automobiles are now acknowledging the value of data science in their mode of operation. Thus, an ideal and efficacious data science team are therefore expected to manage numerous volume of tasks. Even then, developing a team to successfully manage AI tasks is essential to tackle any challenges faced by organizations as regard data.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Books to level up your data skills!

Start Data Engineering

1.

SQL 130
article thumbnail

Fostering inclusion with servant leadership

Cloudera

It is crucial for organizations to focus on supporting the new way of work, enhancing productivity, and improving cost efficiency to ensure business survival in the post-pandemic world. However, those that are overly focused on these short-term goals risk losing sight of what’s truly important. As shared in my previous post , diverse teams can help organizations unlock innovations that allow them to adapt to market changes quickly and drive business growth.

More Trending

article thumbnail

Evolving Container Security With Linux User Namespaces

Netflix Tech

By Fabio Kung , Sargun Dhillon , Andrew Spyker , Kyle , Rob Gulewich, Nabil Schear , Andrew Leung , Daniel Muino, and Manas Alekar As previously discussed on the Netflix Tech Blog, Titus is the Netflix container orchestration system. It runs a wide variety of workloads from various parts of the company?—?everything from the frontend API for netflix.com, to machine learning training workloads, to video encoders.

Media 119
article thumbnail

Low Friction Data Governance With Immuta

Data Engineering Podcast

Summary Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex aspects is that of access control to the data assets that an organization is responsible for managing. The team at Immuta has built a platform that aims to tackle that problem in a flexible and maintainable fashion so that data teams can easily integrate authorization, data masking, and privacy enhancing technologies into their data infrastructure.

article thumbnail

DataKitchen’s 2020 Honors & Awards

DataKitchen

While 2020 has been a collectively difficult year, we want to take a moment to thank all of our employees for the hard work they put into continually developing our DataKitchen DataOps Platform for our customers. We also want to thank all of the data industry groups that have recognized our DataKitchen DataOps Platform and Transformation Advisory Services throughout the year.

article thumbnail

A Few Things to Know When You’re Moving to the Cloud

Teradata

When considering your organization's move to the cloud, it's imperative to understand what the cloud can and cannot do, and how to best leverage its benefits.

Cloud 90
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Cloudera

In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 benchmark. Amazon recently announced their latest EMR version 6.1.0 with support for ACID transactions.

article thumbnail

Transactional Machine Learning at Scale with MAADS-VIPER and Apache Kafka

Confluent

This blog post shows how transactional machine learning (TML) integrates data streams with automated machine learning (AutoML), using Apache Kafka® as the data backbone, to create a frictionless machine learning […].

article thumbnail

How Netflix Scales its API with GraphQL Federation (Part 2)

Netflix Tech

In our previous post and QConPlus talk , we discussed GraphQL Federation as a solution for distributing our GraphQL schema and implementation. In this post, we shift our attention to what is needed to run a federated GraphQL platform successfully?—?from our journey implementing it to lessons learned. Our Journey so Far Over the past year, we’ve implemented the core infrastructure pieces necessary for a federated GraphQL architecture as described in our previous post: Studio Edge Architecture The

IT 102
article thumbnail

Building A Self Service Data Platform For Alternative Data Analytics At YipitData

Data Engineering Podcast

Summary As a data engineer you’re familiar with the process of collecting data from databases, customer data platforms, APIs, etc. At YipitData they rely on a variety of alternative data sources to inform investment decisions by hedge funds and businesses. In this episode Andrew Gross, Bobby Muldoon, and Anup Segu describe the self service data platform that they have built to allow data analysts to own the end-to-end delivery of data projects and how that has allowed them to scale their o

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

DataKitchen’s Best of 2020 DataOps Resources

DataKitchen

We understand that many folks would like to say goodbye and good riddance to 2020. But before we shut the door on such a turbulent, transformative year, we at DataKitchen would like to share the creme de la creme of our DataOps content in hopes that it can help you as you learn about and implement DataOps. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year.

IT 84
article thumbnail

Vantage Social Network Analysis Framework for Covid-19 Risk Metrics

Teradata

Vantage Social Network Analysis (SNA) Framework helps us with useful insights to identify various risk factors associated with the spread of Covid-19. Read more.

66
article thumbnail

How does Apache Spark 3.0 increase the performance of your SQL workloads

Cloudera

Across nearly every sector working with complex data, Spark has quickly become the de-facto distributed computing framework for teams across the data and analytics lifecycle. One of most awaited features of Spark 3.0 is the new Adaptive Query Execution framework (AQE), which fixes the issues that have plagued a lot of Spark SQL workloads. Those were documented in early 2018 in this blog from a mixed Intel and Baidu team.

SQL 102
article thumbnail

Announcing ksqlDB 0.14.0

Confluent

We’re pleased to announce ksqlDB 0.14, one of the most feature-packed releases of the year. This version includes expanded query support over materialized views, incremental schema alteration, variable substitution, additional […].

Process 126
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Supporting content decision makers with machine learning

Netflix Tech

by Melody Dye *, Chaitanya Ekanadham *, Avneesh Saluja *, Ashish Rastogi * contributed equally Netflix is pioneering content creation at an unprecedented scale. Our catalog of thousands of films and series caters to 195M+ members in over 190 countries who span a broad and diverse range of tastes. Content, marketing, and studio production executives make the key decisions that aspire to maximize each series’ or film’s potential to bring joy to our subscribers as it progresses from pitch to play o

article thumbnail

Proven Patterns For Building Successful Data Teams

Data Engineering Podcast

Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is also challenging because of the number of roles and capabilities that are necessary to go from idea to delivery. Different organizations have tried a multitude of organizational strategies to improve the success rate of these data teams with varying levels of success.

Building 100
article thumbnail

Improve Business Agility by Hiring a DataOps Engineer

DataKitchen

It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change. – Leon C. Megginson on Charles Darwin “Origin of Species”. Adapt or face decline. The agile alliance defines “ business agility ” as the ability of an organization to sense changes internally or externally and respond accordingly in order to deliver value to its customers.

article thumbnail

Top Tech Predictions for 2021

Teradata

From COVID-19 to AI in industry, our Teradata experts offer their best predictions for the state of technology and business in 2021 and beyond. Read more.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

2020 Data Impact Award Winner Spotlight: West Midlands Police

Cloudera

Our annual Data Impact Awards are all about celebrating organizations that are unlocking the maximum value from their data in order to drive the business forward. One category that highlighted some fantastic examples of customers doing just that, was The Enterprise Data Cloud award. While data has become crucial in helping businesses weather the storm in the last few months, it’s also been more challenging to manage due to the speed and volume in which it’s produced.

Cloud 101
article thumbnail

Spring Your Microservices into Production with Kubernetes and GitOps

Confluent

Microservice architectures continue to grow within engineering organizations as teams strive to increase development velocity. Microservices promote the idea of modularity as a first-class citizen in a distributed architecture, enabling […].

article thumbnail

Optimizing data warehouse storage

Netflix Tech

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse.

article thumbnail

Measuring usage for open source projects

Grouparoo

A necessary practice and skill to building a successful product is having accurate, accessible, and actionable data. I’ve had the privilege to work at some great companies that are extremely data-focused, and I’ve learned from some of the best along the way. Quantitative user data At Zynga , we had millions of daily active users, where we tracked every click, action, and session across all of our games and apps.

Project 52
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

The Modern Data Stack, Metadata Architectures, and More: Top 10 Links From Across the Web

Data Council

Here's our December 2020 roundup of links from across the web that could be relevant to you: 1. The Modern Data Stack (Fishtown Analytics) This long-form post on the dbt blog is a must-read. Titled “The Modern Data Stack: Past, Present, and Future,” it answers the question that Tristan Handy has been asking himself for the past two years: “What happened to the massive innovation we saw from 2012-2016?

article thumbnail

Teradata at AWS re:Invent

Teradata

Teradata is participating in AWS re:Invent 2020, demonstrating our cloud-first stance as a Gold sponsor. Find out more.

AWS 59
article thumbnail

Covid Data: An anomalous blip, or the new normal?

Cloudera

COVID-19 has forced virtually every industry to embrace an acceleration in digital capabilities. While it can be argued that digital transformation was already underway; it’s hard to dispute that it has accelerated in recent months. A recent McKinsey survey, cited in CRN , shows that worldwide, 58 percent of customer interactions were digital as of July 2020.

Insurance 100
article thumbnail

Getting Started with Scala and Apache Kafka

Confluent

If you’re getting started with Apache Kafka® and event streaming applications, you’ll be pleased to see the variety of languages available to start interacting with the event streaming platform. It […].

Kafka 122
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.