Sat.Dec 05, 2020 - Fri.Dec 11, 2020

article thumbnail

Transactional Machine Learning at Scale with MAADS-VIPER and Apache Kafka

Confluent

This blog post shows how transactional machine learning (TML) integrates data streams with automated machine learning (AutoML), using Apache Kafka® as the data backbone, to create a frictionless machine learning […].

article thumbnail

Books to level up your data skills!

Start Data Engineering

1.

SQL 130
article thumbnail

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

Cloudera

In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 benchmark. Amazon recently announced their latest EMR version 6.1.0 with support for ACID transactions.

article thumbnail

How Netflix Scales its API with GraphQL Federation (Part 2)

Netflix Tech

In our previous post and QConPlus talk , we discussed GraphQL Federation as a solution for distributing our GraphQL schema and implementation. In this post, we shift our attention to what is needed to run a federated GraphQL platform successfully?—?from our journey implementing it to lessons learned. Our Journey so Far Over the past year, we’ve implemented the core infrastructure pieces necessary for a federated GraphQL architecture as described in our previous post: Studio Edge Architecture The

IT 102
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Getting Started with Scala and Apache Kafka

Confluent

If you’re getting started with Apache Kafka® and event streaming applications, you’ll be pleased to see the variety of languages available to start interacting with the event streaming platform. It […].

Kafka 122
article thumbnail

Proven Patterns For Building Successful Data Teams

Data Engineering Podcast

Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is also challenging because of the number of roles and capabilities that are necessary to go from idea to delivery. Different organizations have tried a multitude of organizational strategies to improve the success rate of these data teams with varying levels of success.

Building 100

More Trending

article thumbnail

Supporting content decision makers with machine learning

Netflix Tech

by Melody Dye *, Chaitanya Ekanadham *, Avneesh Saluja *, Ashish Rastogi * contributed equally Netflix is pioneering content creation at an unprecedented scale. Our catalog of thousands of films and series caters to 195M+ members in over 190 countries who span a broad and diverse range of tastes. Content, marketing, and studio production executives make the key decisions that aspire to maximize each series’ or film’s potential to bring joy to our subscribers as it progresses from pitch to play o

article thumbnail

How to Run Apache Kafka on Windows

Confluent

Is Windows your favorite development environment? Do you want to run Apache Kafka® on Windows? Thanks to the Windows Subsystem for Linux 2 (WSL 2), now you can, and with […].

Kafka 116
article thumbnail

Booking’s Journey with Brotli

Booking.com Engineering

Booking.com’s Journey with Brotli The challenges of improving performance in a complex environment The Transfăgărășan road in Romania is known for its jaw-dropping views. But you’re gonna have to work for it. Photo CC BY-SA 2.0 by Antony Stanley , from Flickr. Brotli is a lossless compression algorithm, designed and released by Google for use on the web.

Bytes 52
article thumbnail

2020 Data Impact Award Winner Spotlight: Merck KGaA

Cloudera

The Data Security and Governance category, at the annual Data Impact Awards, has never been so important. Consider for a moment, just how much 2020 brought about for businesses to deal with. The sudden rise in remote working, a huge influx in data as the world turned digital, not to mention the never-ending list of regulations businesses need to remain compliant with (how many acronyms can you name in full?

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Toward a Better Quality Metric for the Video Community

Netflix Tech

by Zhi Li, Kyle Swanson, Christos Bampis, Lukáš Krasula and Anne Aaron Over the past few years, we have been striving to make VMAF a more usable tool not just for Netflix, but for the video community at large. This tech blog highlights our recent progress toward this goal. VMAF is a video quality metric that Netflix jointly developed with a number of university collaborators and open-sourced on Github.

article thumbnail

Apache Kafka Lag Monitoring at AppsFlyer

Confluent

This article covers one crucial piece of every distributed system: visibility. At AppsFlyer, we call ourselves metrics obsessed and truly believe that you cannot know what you cannot see. We […].

Kafka 111
article thumbnail

Data.What? What Can I Buy in a Data Marketplace?

Teradata

How does a Data Marketplace relate to Data Sharing? Here's a hint: enabling both internal and external users to access integrated data on demand to bring agility to business. Read more.

Data 52
article thumbnail

Cost Conscious Data Warehousing with Cloudera Data Platform

Cloudera

Why worry about costs with cloud-native data warehousing? Have you been burned by the unexpected costs of a cloud data warehouse? If so, you know about the failed economics of some cloud-native solutions on the market today. If not, before adopting a cloud data warehouse, consider the true costs of a cloud-native data warehouse. Data warehouses have been broadly adopted to provide timely reports and valuable insights.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

What I've Learned in 2020: A Technical Version

Rockset

I'm on paternity leave till the end of year since my daughter is on the way, and since I have some little time left before getting really busy, I want to reflect on how I've grown as an engineer in 2020. I left Facebook at the end of 2019 to join Rockset, and it has been a fun year. For those who don't know, Rockset is a real-time analytics database.

article thumbnail

Preset Getting Started Guide is Now Available

Preset

End-user documentation is focused on taking you step-by-step through the entire onboarding Preset Cloud experience, from connecting your data to building your very first chart and dashboard.

Cloud 40
article thumbnail

Medibank

Teradata

Teradata Vantage on AWS transforms private healthcare company to create “Better Health for Better Lives.

article thumbnail

Federated Learning, Machine Learning, Decentralized Data

Cloudera

Two years ago we wrote a research report about Federated Learning. We’re pleased to make the report available to everyone, for free. You can read it online here: Federated Learning. Federated Learning is a paradigm in which machine learning models are trained on decentralized data. Instead of collecting data on a single server or data lake, it remains in place — on smartphones, industrial sensing equipment, and other edge devices — and models are trained on-device.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

2020 Data Impact Award Winner Spotlight: Globe Telecom

Cloudera

It’s been a few weeks since we celebrated the 2020 Data Impact Awards, and everyone at Cloudera is still on a high. It was a brilliant event, and we are so pleased we were able to celebrate our fantastic customers virtually. Thank you again to all those who tuned in! . The Connect the Data Lifecycle award was our fifth award at this year’s ceremony.

article thumbnail

How Cloudera Supports Government Data Encryption Standards

Cloudera

As part of our ongoing commitment to supporting Government regulations and standards in our enterprise solutions, including data protection, Cloudera recently introduced a version of our Cloudera Data Platform, Private Cloud Base product (7.1.5 release) that can be configured to use FIPS compliant cryptography. We have accomplished this significant improvement through supporting the deployment of the Cloudera Data Platform (CDP) Private Cloud Base on FIPS mode enabled RedHat Enterprise Linux (RH

article thumbnail

Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance

Cloudera

There are lessons to be learned from the brick and mortar or pure-play digital retailers that have been successful in the Covid-19 chaos. As the pandemic’s stress test of e-commerce, in-store insights, supply chain visibility, and fulfillment capabilities have revealed shortcomings, and long-lasting consumer experiences— it has also allowed many companies to pivot to very successful strategies built on enterprise data and the digitization efforts that accompany it.

Retail 65
article thumbnail

Global View Distributed File System with Mount Points

Cloudera

Apache Hadoop Distributed File System (HDFS) is the most popular file system in the big data world. The Apache Hadoop File System interface has provided integration to many other popular storage systems like Apache Ozone, S3, Azure Data Lake Storage etc. Some HDFS users want to extend the HDFS Namenode capacity by configuring Federation of Namenodes.

Systems 61
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

How to configure clients to connect to Apache Kafka Clusters securely – Part 2: LDAP

Cloudera

In the previous post, we talked about Kerberos authentication and explained how to configure a Kafka client to authenticate using Kerberos credentials. In this post we will look into how to configure a Kafka client to authenticate using LDAP, instead of Kerberos. We will not cover the server-side configuration in this article but will add some references to it when required to make the examples clearer.

Kafka 52
article thumbnail

Looking Forwards Not Backwards: New Ways of Working for the CFO

Teradata

The bold CFO that steps into the breach and takes ownership of the bank’s data asset can transform the way they work and add massive value. Learn more.

Data 52
article thumbnail

The Economic Value of Supply Chain Investments

Teradata

What is the impact of adjusting various supply chain levers on a company's stock price? How do they impact shareholder value? Find out more.

52
article thumbnail

How Much Security Is Too Much Security?

Teradata

In these budget conscious times, how much security is too much security? That depends on how much you value your data. Read more.

Data 52
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.