12 Days of Apache Kafka
Confluent
DECEMBER 28, 2020
Before you say it: Yes, we are right now three days past Christmas, but technically the 12 days of Christmas refer to the days between Christmas and Epiphany, which is—I […].
Confluent
DECEMBER 28, 2020
Before you say it: Yes, we are right now three days past Christmas, but technically the 12 days of Christmas refer to the days between Christmas and Epiphany, which is—I […].
Netflix Tech
DECEMBER 14, 2020
Life of a Netflix Partner Engineer?—?The case of the extra 40 ms By: John Blair , Netflix Partner Engineering The Netflix application runs on hundreds of smart TVs, streaming sticks and pay TV set top boxes. The role of a Partner Engineer at Netflix is to help device manufacturers launch the Netflix application on their devices. In this article we talk about one particularly difficult issue that blocked the launch of a device in Europe.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Data Engineering Podcast
DECEMBER 28, 2020
Summary One of the core responsibilities of data engineers is to manage the security of the information that they process. The team at Satori has a background in cybersecurity and they are using the lessons that they learned in that field to address the challenge of access control and auditing for data governance. In this episode co-founder and CTO Yoav Cohen explains how the Satori platform provides a proxy layer for your data, the challenges of managing security across disparate storage system
Team Data Science
DECEMBER 17, 2020
Organizations in several industries such as banking, healthcare, and automobiles are now acknowledging the value of data science in their mode of operation. Thus, an ideal and efficacious data science team are therefore expected to manage numerous volume of tasks. Even then, developing a team to successfully manage AI tasks is essential to tackle any challenges faced by organizations as regard data.
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
Cloudera
DECEMBER 15, 2020
It is crucial for organizations to focus on supporting the new way of work, enhancing productivity, and improving cost efficiency to ensure business survival in the post-pandemic world. However, those that are overly focused on these short-term goals risk losing sight of what’s truly important. As shared in my previous post , diverse teams can help organizations unlock innovations that allow them to adapt to market changes quickly and drive business growth.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Netflix Tech
DECEMBER 23, 2020
By Fabio Kung , Sargun Dhillon , Andrew Spyker , Kyle , Rob Gulewich, Nabil Schear , Andrew Leung , Daniel Muino, and Manas Alekar As previously discussed on the Netflix Tech Blog, Titus is the Netflix container orchestration system. It runs a wide variety of workloads from various parts of the company?—?everything from the frontend API for netflix.com, to machine learning training workloads, to video encoders.
Data Engineering Podcast
DECEMBER 21, 2020
Summary Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex aspects is that of access control to the data assets that an organization is responsible for managing. The team at Immuta has built a platform that aims to tackle that problem in a flexible and maintainable fashion so that data teams can easily integrate authorization, data masking, and privacy enhancing technologies into their data infrastructure.
DataKitchen
DECEMBER 30, 2020
While 2020 has been a collectively difficult year, we want to take a moment to thank all of our employees for the hard work they put into continually developing our DataKitchen DataOps Platform for our customers. We also want to thank all of the data industry groups that have recognized our DataKitchen DataOps Platform and Transformation Advisory Services throughout the year.
Teradata
DECEMBER 29, 2020
When considering your organization's move to the cloud, it's imperative to understand what the cloud can and cannot do, and how to best leverage its benefits.
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Cloudera
DECEMBER 11, 2020
In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 benchmark. Amazon recently announced their latest EMR version 6.1.0 with support for ACID transactions.
Confluent
DECEMBER 11, 2020
This blog post shows how transactional machine learning (TML) integrates data streams with automated machine learning (AutoML), using Apache Kafka® as the data backbone, to create a frictionless machine learning […].
Netflix Tech
DECEMBER 11, 2020
In our previous post and QConPlus talk , we discussed GraphQL Federation as a solution for distributing our GraphQL schema and implementation. In this post, we shift our attention to what is needed to run a federated GraphQL platform successfully?—?from our journey implementing it to lessons learned. Our Journey so Far Over the past year, we’ve implemented the core infrastructure pieces necessary for a federated GraphQL architecture as described in our previous post: Studio Edge Architecture The
Data Engineering Podcast
DECEMBER 14, 2020
Summary As a data engineer you’re familiar with the process of collecting data from databases, customer data platforms, APIs, etc. At YipitData they rely on a variety of alternative data sources to inform investment decisions by hedge funds and businesses. In this episode Andrew Gross, Bobby Muldoon, and Anup Segu describe the self service data platform that they have built to allow data analysts to own the end-to-end delivery of data projects and how that has allowed them to scale their o
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
DataKitchen
DECEMBER 21, 2020
We understand that many folks would like to say goodbye and good riddance to 2020. But before we shut the door on such a turbulent, transformative year, we at DataKitchen would like to share the creme de la creme of our DataOps content in hopes that it can help you as you learn about and implement DataOps. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year.
Teradata
DECEMBER 16, 2020
Vantage Social Network Analysis (SNA) Framework helps us with useful insights to identify various risk factors associated with the spread of Covid-19. Read more.
Cloudera
DECEMBER 15, 2020
Across nearly every sector working with complex data, Spark has quickly become the de-facto distributed computing framework for teams across the data and analytics lifecycle. One of most awaited features of Spark 3.0 is the new Adaptive Query Execution framework (AQE), which fixes the issues that have plagued a lot of Spark SQL workloads. Those were documented in early 2018 in this blog from a mixed Intel and Baidu team.
Confluent
DECEMBER 16, 2020
We’re pleased to announce ksqlDB 0.14, one of the most feature-packed releases of the year. This version includes expanded query support over materialized views, incremental schema alteration, variable substitution, additional […].
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Netflix Tech
DECEMBER 10, 2020
by Melody Dye *, Chaitanya Ekanadham *, Avneesh Saluja *, Ashish Rastogi * contributed equally Netflix is pioneering content creation at an unprecedented scale. Our catalog of thousands of films and series caters to 195M+ members in over 190 countries who span a broad and diverse range of tastes. Content, marketing, and studio production executives make the key decisions that aspire to maximize each series’ or film’s potential to bring joy to our subscribers as it progresses from pitch to play o
Data Engineering Podcast
DECEMBER 7, 2020
Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is also challenging because of the number of roles and capabilities that are necessary to go from idea to delivery. Different organizations have tried a multitude of organizational strategies to improve the success rate of these data teams with varying levels of success.
DataKitchen
DECEMBER 20, 2020
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change. – Leon C. Megginson on Charles Darwin “Origin of Species”. Adapt or face decline. The agile alliance defines “ business agility ” as the ability of an organization to sense changes internally or externally and respond accordingly in order to deliver value to its customers.
Teradata
DECEMBER 14, 2020
From COVID-19 to AI in industry, our Teradata experts offer their best predictions for the state of technology and business in 2021 and beyond. Read more.
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
Cloudera
DECEMBER 11, 2020
The Data Security and Governance category, at the annual Data Impact Awards, has never been so important. Consider for a moment, just how much 2020 brought about for businesses to deal with. The sudden rise in remote working, a huge influx in data as the world turned digital, not to mention the never-ending list of regulations businesses need to remain compliant with (how many acronyms can you name in full?
Confluent
DECEMBER 17, 2020
Microservice architectures continue to grow within engineering organizations as teams strive to increase development velocity. Microservices promote the idea of modularity as a first-class citizen in a distributed architecture, enabling […].
Netflix Tech
DECEMBER 21, 2020
By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse.
Grouparoo
DECEMBER 22, 2020
A necessary practice and skill to building a successful product is having accurate, accessible, and actionable data. I’ve had the privilege to work at some great companies that are extremely data-focused, and I’ve learned from some of the best along the way. Quantitative user data At Zynga , we had millions of daily active users, where we tracked every click, action, and session across all of our games and apps.
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Data Council
DECEMBER 21, 2020
Here's our December 2020 roundup of links from across the web that could be relevant to you: 1. The Modern Data Stack (Fishtown Analytics) This long-form post on the dbt blog is a must-read. Titled “The Modern Data Stack: Past, Present, and Future,” it answers the question that Tristan Handy has been asking himself for the past two years: “What happened to the massive innovation we saw from 2012-2016?
Teradata
DECEMBER 1, 2020
Teradata is participating in AWS re:Invent 2020, demonstrating our cloud-first stance as a Gold sponsor. Find out more.
Cloudera
DECEMBER 21, 2020
In this blog we will take you through a persona-based data adventure, with short demos attached, to show you the A-Z data worker workflow expedited and made easier through self-service, seamless integration, and cloud-native technologies. You will learn all the parts of Cloudera’s Data Platform that together will accelerate your everyday Data Worker tasks.
Confluent
DECEMBER 8, 2020
If you’re getting started with Apache Kafka® and event streaming applications, you’ll be pleased to see the variety of languages available to start interacting with the event streaming platform. It […].
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Let's personalize your content