Sat.Dec 05, 2020 - Fri.Dec 11, 2020

article thumbnail

Transactional Machine Learning at Scale with MAADS-VIPER and Apache Kafka

Confluent

This blog post shows how transactional machine learning (TML) integrates data streams with automated machine learning (AutoML), using Apache Kafka® as the data backbone, to create a frictionless machine learning […].

article thumbnail

Books to level up your data skills!

Start Data Engineering

1.

SQL 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Cloudera

In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 benchmark. Amazon recently announced their latest EMR version 6.1.0 with support for ACID transactions.

article thumbnail

How Netflix Scales its API with GraphQL Federation (Part 2)

Netflix Tech

In our previous post and QConPlus talk , we discussed GraphQL Federation as a solution for distributing our GraphQL schema and implementation. In this post, we shift our attention to what is needed to run a federated GraphQL platform successfully?—?from our journey implementing it to lessons learned. Our Journey so Far Over the past year, we’ve implemented the core infrastructure pieces necessary for a federated GraphQL architecture as described in our previous post: Studio Edge Architecture The

IT 102
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Getting Started with Scala and Apache Kafka

Confluent

If you’re getting started with Apache Kafka® and event streaming applications, you’ll be pleased to see the variety of languages available to start interacting with the event streaming platform. It […].

Kafka 122
article thumbnail

Proven Patterns For Building Successful Data Teams

Data Engineering Podcast

Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is also challenging because of the number of roles and capabilities that are necessary to go from idea to delivery. Different organizations have tried a multitude of organizational strategies to improve the success rate of these data teams with varying levels of success.

Building 100

More Trending

article thumbnail

Supporting content decision makers with machine learning

Netflix Tech

by Melody Dye *, Chaitanya Ekanadham *, Avneesh Saluja *, Ashish Rastogi * contributed equally Netflix is pioneering content creation at an unprecedented scale. Our catalog of thousands of films and series caters to 195M+ members in over 190 countries who span a broad and diverse range of tastes. Content, marketing, and studio production executives make the key decisions that aspire to maximize each series’ or film’s potential to bring joy to our subscribers as it progresses from pitch to play o

article thumbnail

How to Run Apache Kafka on Windows

Confluent

Is Windows your favorite development environment? Do you want to run Apache Kafka® on Windows? Thanks to the Windows Subsystem for Linux 2 (WSL 2), now you can, and with […].

Kafka 116
article thumbnail

Booking’s Journey with Brotli

Booking.com Engineering

Booking.com’s Journey with Brotli The challenges of improving performance in a complex environment The Transfăgărășan road in Romania is known for its jaw-dropping views. But you’re gonna have to work for it. Photo CC BY-SA 2.0 by Antony Stanley , from Flickr. Brotli is a lossless compression algorithm, designed and released by Google for use on the web.

Bytes 52
article thumbnail

Covid Data: An anomalous blip, or the new normal?

Cloudera

COVID-19 has forced virtually every industry to embrace an acceleration in digital capabilities. While it can be argued that digital transformation was already underway; it’s hard to dispute that it has accelerated in recent months. A recent McKinsey survey, cited in CRN , shows that worldwide, 58 percent of customer interactions were digital as of July 2020.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Toward a Better Quality Metric for the Video Community

Netflix Tech

by Zhi Li, Kyle Swanson, Christos Bampis, Lukáš Krasula and Anne Aaron Over the past few years, we have been striving to make VMAF a more usable tool not just for Netflix, but for the video community at large. This tech blog highlights our recent progress toward this goal. VMAF is a video quality metric that Netflix jointly developed with a number of university collaborators and open-sourced on Github.

article thumbnail

Apache Kafka Lag Monitoring at AppsFlyer

Confluent

This article covers one crucial piece of every distributed system: visibility. At AppsFlyer, we call ourselves metrics obsessed and truly believe that you cannot know what you cannot see. We […].

Kafka 111
article thumbnail

Data.What? What Can I Buy in a Data Marketplace?

Teradata

How does a Data Marketplace relate to Data Sharing? Here's a hint: enabling both internal and external users to access integrated data on demand to bring agility to business. Read more.

Data 52
article thumbnail

Federated Learning, Machine Learning, Decentralized Data

Cloudera

Two years ago we wrote a research report about Federated Learning. We’re pleased to make the report available to everyone, for free. You can read it online here: Federated Learning. Federated Learning is a paradigm in which machine learning models are trained on decentralized data. Instead of collecting data on a single server or data lake, it remains in place — on smartphones, industrial sensing equipment, and other edge devices — and models are trained on-device.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

What I've Learned in 2020: A Technical Version

Rockset

I'm on paternity leave till the end of year since my daughter is on the way, and since I have some little time left before getting really busy, I want to reflect on how I've grown as an engineer in 2020. I left Facebook at the end of 2019 to join Rockset, and it has been a fun year. For those who don't know, Rockset is a real-time analytics database.

article thumbnail

Preset Getting Started Guide is Now Available

Preset

End-user documentation is focused on taking you step-by-step through the entire onboarding Preset Cloud experience, from connecting your data to building your very first chart and dashboard.

Cloud 40
article thumbnail

Medibank

Teradata

Teradata Vantage on AWS transforms private healthcare company to create “Better Health for Better Lives.

article thumbnail

2020 Data Impact Award Winner Spotlight: Globe Telecom

Cloudera

It’s been a few weeks since we celebrated the 2020 Data Impact Awards, and everyone at Cloudera is still on a high. It was a brilliant event, and we are so pleased we were able to celebrate our fantastic customers virtually. Thank you again to all those who tuned in! . The Connect the Data Lifecycle award was our fifth award at this year’s ceremony.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

How Cloudera Supports Government Data Encryption Standards

Cloudera

As part of our ongoing commitment to supporting Government regulations and standards in our enterprise solutions, including data protection, Cloudera recently introduced a version of our Cloudera Data Platform, Private Cloud Base product (7.1.5 release) that can be configured to use FIPS compliant cryptography. We have accomplished this significant improvement through supporting the deployment of the Cloudera Data Platform (CDP) Private Cloud Base on FIPS mode enabled RedHat Enterprise Linux (RH

article thumbnail

Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance

Cloudera

There are lessons to be learned from the brick and mortar or pure-play digital retailers that have been successful in the Covid-19 chaos. As the pandemic’s stress test of e-commerce, in-store insights, supply chain visibility, and fulfillment capabilities have revealed shortcomings, and long-lasting consumer experiences— it has also allowed many companies to pivot to very successful strategies built on enterprise data and the digitization efforts that accompany it.

Retail 62
article thumbnail

Global View Distributed File System with Mount Points

Cloudera

Apache Hadoop Distributed File System (HDFS) is the most popular file system in the big data world. The Apache Hadoop File System interface has provided integration to many other popular storage systems like Apache Ozone, S3, Azure Data Lake Storage etc. Some HDFS users want to extend the HDFS Namenode capacity by configuring Federation of Namenodes.

Systems 61
article thumbnail

How to configure clients to connect to Apache Kafka Clusters securely – Part 2: LDAP

Cloudera

In the previous post, we talked about Kerberos authentication and explained how to configure a Kafka client to authenticate using Kerberos credentials. In this post we will look into how to configure a Kafka client to authenticate using LDAP, instead of Kerberos. We will not cover the server-side configuration in this article but will add some references to it when required to make the examples clearer.

Kafka 52
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Looking Forwards Not Backwards: New Ways of Working for the CFO

Teradata

The bold CFO that steps into the breach and takes ownership of the bank’s data asset can transform the way they work and add massive value. Learn more.

Data 52
article thumbnail

The Economic Value of Supply Chain Investments

Teradata

What is the impact of adjusting various supply chain levers on a company's stock price? How do they impact shareholder value? Find out more.

52
article thumbnail

How Much Security Is Too Much Security?

Teradata

In these budget conscious times, how much security is too much security? That depends on how much you value your data. Read more.

Data 52
article thumbnail

Cost Conscious Data Warehousing with Cloudera Data Platform

Cloudera

Why worry about costs with cloud-native data warehousing? Have you been burned by the unexpected costs of a cloud data warehouse? If so, you know about the failed economics of some cloud-native solutions on the market today. If not, before adopting a cloud data warehouse, consider the true costs of a cloud-native data warehouse. Data warehouses have been broadly adopted to provide timely reports and valuable insights.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!