12 Days of Apache Kafka
Confluent
DECEMBER 28, 2020
Before you say it: Yes, we are right now three days past Christmas, but technically the 12 days of Christmas refer to the days between Christmas and Epiphany, which is—I […].
Confluent
DECEMBER 28, 2020
Before you say it: Yes, we are right now three days past Christmas, but technically the 12 days of Christmas refer to the days between Christmas and Epiphany, which is—I […].
Netflix Tech
DECEMBER 14, 2020
Life of a Netflix Partner Engineer?—?The case of the extra 40 ms By: John Blair , Netflix Partner Engineering The Netflix application runs on hundreds of smart TVs, streaming sticks and pay TV set top boxes. The role of a Partner Engineer at Netflix is to help device manufacturers launch the Netflix application on their devices. In this article we talk about one particularly difficult issue that blocked the launch of a device in Europe.
Data Engineering Podcast
DECEMBER 28, 2020
Summary One of the core responsibilities of data engineers is to manage the security of the information that they process. The team at Satori has a background in cybersecurity and they are using the lessons that they learned in that field to address the challenge of access control and auditing for data governance. In this episode co-founder and CTO Yoav Cohen explains how the Satori platform provides a proxy layer for your data, the challenges of managing security across disparate storage system
Team Data Science
DECEMBER 17, 2020
Organizations in several industries such as banking, healthcare, and automobiles are now acknowledging the value of data science in their mode of operation. Thus, an ideal and efficacious data science team are therefore expected to manage numerous volume of tasks. Even then, developing a team to successfully manage AI tasks is essential to tackle any challenges faced by organizations as regard data.
Advertisement
Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.
Cloudera
DECEMBER 15, 2020
It is crucial for organizations to focus on supporting the new way of work, enhancing productivity, and improving cost efficiency to ensure business survival in the post-pandemic world. However, those that are overly focused on these short-term goals risk losing sight of what’s truly important. As shared in my previous post , diverse teams can help organizations unlock innovations that allow them to adapt to market changes quickly and drive business growth.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Netflix Tech
DECEMBER 23, 2020
By Fabio Kung , Sargun Dhillon , Andrew Spyker , Kyle , Rob Gulewich, Nabil Schear , Andrew Leung , Daniel Muino, and Manas Alekar As previously discussed on the Netflix Tech Blog, Titus is the Netflix container orchestration system. It runs a wide variety of workloads from various parts of the company?—?everything from the frontend API for netflix.com, to machine learning training workloads, to video encoders.
Data Engineering Podcast
DECEMBER 21, 2020
Summary Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex aspects is that of access control to the data assets that an organization is responsible for managing. The team at Immuta has built a platform that aims to tackle that problem in a flexible and maintainable fashion so that data teams can easily integrate authorization, data masking, and privacy enhancing technologies into their data infrastructure.
DataKitchen
DECEMBER 30, 2020
While 2020 has been a collectively difficult year, we want to take a moment to thank all of our employees for the hard work they put into continually developing our DataKitchen DataOps Platform for our customers. We also want to thank all of the data industry groups that have recognized our DataKitchen DataOps Platform and Transformation Advisory Services throughout the year.
Teradata
DECEMBER 29, 2020
When considering your organization's move to the cloud, it's imperative to understand what the cloud can and cannot do, and how to best leverage its benefits.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Cloudera
DECEMBER 11, 2020
In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 benchmark. Amazon recently announced their latest EMR version 6.1.0 with support for ACID transactions.
Confluent
DECEMBER 11, 2020
This blog post shows how transactional machine learning (TML) integrates data streams with automated machine learning (AutoML), using Apache Kafka® as the data backbone, to create a frictionless machine learning […].
Netflix Tech
DECEMBER 11, 2020
In our previous post and QConPlus talk , we discussed GraphQL Federation as a solution for distributing our GraphQL schema and implementation. In this post, we shift our attention to what is needed to run a federated GraphQL platform successfully?—?from our journey implementing it to lessons learned. Our Journey so Far Over the past year, we’ve implemented the core infrastructure pieces necessary for a federated GraphQL architecture as described in our previous post: Studio Edge Architecture The
Data Engineering Podcast
DECEMBER 14, 2020
Summary As a data engineer you’re familiar with the process of collecting data from databases, customer data platforms, APIs, etc. At YipitData they rely on a variety of alternative data sources to inform investment decisions by hedge funds and businesses. In this episode Andrew Gross, Bobby Muldoon, and Anup Segu describe the self service data platform that they have built to allow data analysts to own the end-to-end delivery of data projects and how that has allowed them to scale their o
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
DataKitchen
DECEMBER 21, 2020
We understand that many folks would like to say goodbye and good riddance to 2020. But before we shut the door on such a turbulent, transformative year, we at DataKitchen would like to share the creme de la creme of our DataOps content in hopes that it can help you as you learn about and implement DataOps. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year.
Teradata
DECEMBER 16, 2020
Vantage Social Network Analysis (SNA) Framework helps us with useful insights to identify various risk factors associated with the spread of Covid-19. Read more.
Cloudera
DECEMBER 15, 2020
Across nearly every sector working with complex data, Spark has quickly become the de-facto distributed computing framework for teams across the data and analytics lifecycle. One of most awaited features of Spark 3.0 is the new Adaptive Query Execution framework (AQE), which fixes the issues that have plagued a lot of Spark SQL workloads. Those were documented in early 2018 in this blog from a mixed Intel and Baidu team.
Confluent
DECEMBER 16, 2020
We’re pleased to announce ksqlDB 0.14, one of the most feature-packed releases of the year. This version includes expanded query support over materialized views, incremental schema alteration, variable substitution, additional […].
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Netflix Tech
DECEMBER 10, 2020
by Melody Dye *, Chaitanya Ekanadham *, Avneesh Saluja *, Ashish Rastogi * contributed equally Netflix is pioneering content creation at an unprecedented scale. Our catalog of thousands of films and series caters to 195M+ members in over 190 countries who span a broad and diverse range of tastes. Content, marketing, and studio production executives make the key decisions that aspire to maximize each series’ or film’s potential to bring joy to our subscribers as it progresses from pitch to play o
Data Engineering Podcast
DECEMBER 7, 2020
Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is also challenging because of the number of roles and capabilities that are necessary to go from idea to delivery. Different organizations have tried a multitude of organizational strategies to improve the success rate of these data teams with varying levels of success.
DataKitchen
DECEMBER 20, 2020
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change. – Leon C. Megginson on Charles Darwin “Origin of Species”. Adapt or face decline. The agile alliance defines “ business agility ” as the ability of an organization to sense changes internally or externally and respond accordingly in order to deliver value to its customers.
Teradata
DECEMBER 14, 2020
From COVID-19 to AI in industry, our Teradata experts offer their best predictions for the state of technology and business in 2021 and beyond. Read more.
Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali
As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.
Cloudera
DECEMBER 21, 2020
Our annual Data Impact Awards are all about celebrating organizations that are unlocking the maximum value from their data in order to drive the business forward. One category that highlighted some fantastic examples of customers doing just that, was The Enterprise Data Cloud award. While data has become crucial in helping businesses weather the storm in the last few months, it’s also been more challenging to manage due to the speed and volume in which it’s produced.
Confluent
DECEMBER 17, 2020
Microservice architectures continue to grow within engineering organizations as teams strive to increase development velocity. Microservices promote the idea of modularity as a first-class citizen in a distributed architecture, enabling […].
Netflix Tech
DECEMBER 21, 2020
By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse.
Grouparoo
DECEMBER 22, 2020
A necessary practice and skill to building a successful product is having accurate, accessible, and actionable data. I’ve had the privilege to work at some great companies that are extremely data-focused, and I’ve learned from some of the best along the way. Quantitative user data At Zynga , we had millions of daily active users, where we tracked every click, action, and session across all of our games and apps.
Speaker: Nikhil Joshi, Founder & President of Snic Solutions
Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.
Data Council
DECEMBER 21, 2020
Here's our December 2020 roundup of links from across the web that could be relevant to you: 1. The Modern Data Stack (Fishtown Analytics) This long-form post on the dbt blog is a must-read. Titled “The Modern Data Stack: Past, Present, and Future,” it answers the question that Tristan Handy has been asking himself for the past two years: “What happened to the massive innovation we saw from 2012-2016?
Teradata
DECEMBER 1, 2020
Teradata is participating in AWS re:Invent 2020, demonstrating our cloud-first stance as a Gold sponsor. Find out more.
Cloudera
DECEMBER 11, 2020
COVID-19 has forced virtually every industry to embrace an acceleration in digital capabilities. While it can be argued that digital transformation was already underway; it’s hard to dispute that it has accelerated in recent months. A recent McKinsey survey, cited in CRN , shows that worldwide, 58 percent of customer interactions were digital as of July 2020.
Confluent
DECEMBER 8, 2020
If you’re getting started with Apache Kafka® and event streaming applications, you’ll be pleased to see the variety of languages available to start interacting with the event streaming platform. It […].
Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage
When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.
Let's personalize your content