Sat.Dec 12, 2020 - Fri.Dec 18, 2020

article thumbnail

Introducing the Confluent Parallel Message Processing Client

Confluent

Consuming messages in parallel is what Apache Kafka® is all about, so you may well wonder, why would we want anything else? It turns out that, in practice, there are […].

Process 144
article thumbnail

Life of a Netflix Partner Engineer?—?The case of extra 40 ms

Netflix Tech

Life of a Netflix Partner Engineer?—?The case of the extra 40 ms By: John Blair , Netflix Partner Engineering The Netflix application runs on hundreds of smart TVs, streaming sticks and pay TV set top boxes. The role of a Partner Engineer at Netflix is to help device manufacturers launch the Netflix application on their devices. In this article we talk about one particularly difficult issue that blocked the launch of a device in Europe.

Bytes 143
article thumbnail

Is Data Engineering a must for Data Scientists?

Team Data Science

Organizations in several industries such as banking, healthcare, and automobiles are now acknowledging the value of data science in their mode of operation. Thus, an ideal and efficacious data science team are therefore expected to manage numerous volume of tasks. Even then, developing a team to successfully manage AI tasks is essential to tackle any challenges faced by organizations as regard data.

article thumbnail

Fostering inclusion with servant leadership

Cloudera

It is crucial for organizations to focus on supporting the new way of work, enhancing productivity, and improving cost efficiency to ensure business survival in the post-pandemic world. However, those that are overly focused on these short-term goals risk losing sight of what’s truly important. As shared in my previous post , diverse teams can help organizations unlock innovations that allow them to adapt to market changes quickly and drive business growth.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Announcing ksqlDB 0.14.0

Confluent

We’re pleased to announce ksqlDB 0.14, one of the most feature-packed releases of the year. This version includes expanded query support over materialized views, incremental schema alteration, variable substitution, additional […].

Process 126
article thumbnail

Building A Self Service Data Platform For Alternative Data Analytics At YipitData

Data Engineering Podcast

Summary As a data engineer you’re familiar with the process of collecting data from databases, customer data platforms, APIs, etc. At YipitData they rely on a variety of alternative data sources to inform investment decisions by hedge funds and businesses. In this episode Andrew Gross, Bobby Muldoon, and Anup Segu describe the self service data platform that they have built to allow data analysts to own the end-to-end delivery of data projects and how that has allowed them to scale their o

More Trending

article thumbnail

How does Apache Spark 3.0 increase the performance of your SQL workloads

Cloudera

Across nearly every sector working with complex data, Spark has quickly become the de-facto distributed computing framework for teams across the data and analytics lifecycle. One of most awaited features of Spark 3.0 is the new Adaptive Query Execution framework (AQE), which fixes the issues that have plagued a lot of Spark SQL workloads. Those were documented in early 2018 in this blog from a mixed Intel and Baidu team.

SQL 102
article thumbnail

Spring Your Microservices into Production with Kubernetes and GitOps

Confluent

Microservice architectures continue to grow within engineering organizations as teams strive to increase development velocity. Microservices promote the idea of modularity as a first-class citizen in a distributed architecture, enabling […].

article thumbnail

Mythbusting the Analytics Journey

Netflix Tech

Part of our series on who works in Analytics at Netflix?—?and what the role entails by Alex Diamond This Q&A aims to mythbust some common misconceptions about succeeding in analytics at a big tech company. This isn’t your typical recruiting story. I wasn’t actively looking for a new job and Netflix was the only place I applied. I didn’t know anyone who worked there and just submitted my resume through the Jobs page ???????

article thumbnail

Top Tech Predictions for 2021

Teradata

From COVID-19 to AI in industry, our Teradata experts offer their best predictions for the state of technology and business in 2021 and beyond. Read more.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Cloudera Replication Plugin enables x-platform replication for Apache HBase

Cloudera

The Cloudera Data Platform (CDP) is the latest Big Data offering from Cloudera. It includes Apache HBase and Phoenix as part of the platform. These two components are provided in 3 form-factors: For on-prem deployments, they are available in a manner similar to CDH & HDP (within the CDP Private Cloud offering). For customers that want to manage the database on their own in AWS & Azure, it is available as part of the CDP Public Cloud DataHub offering (with the Operational Database templa

AWS 84
article thumbnail

4 Incredible ksqlDB Techniques (#2 Will Make You Cry)

Confluent

Building event streaming applications has never been simpler with ksqlDB. But what is it? ksqlDB is an event streaming database for building stream processing applications. Unlike Kafka Streams, ksqlDB programs […].

Kafka 107
article thumbnail

Meet Magpie: The End-to-End Data Engineering Platform (VIDEO)

Silectis

If you’ve been following along with Silectis over the past couple of years, you are familiar with our data engineering platform, Magpie. You’re aware of the many outcomes it puts at the fingertips of data engineers, and teams of data practitioners more largely. If you’re new around here, not to worry. We can catch you up quickly. We are excited to share our brand new explainer video with you!

article thumbnail

Avoid Making the Same Mistake Twice

Teradata

Data & analytics only have real value when they are used to improve performance by reducing costs, increasing customer satisfaction or driving new growth. Read more.

Data 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Enabling The Full ML Lifecycle For Scaling AI Use Cases

Cloudera

When it comes to machine learning (ML) in the enterprise, there are many misconceptions about what it actually takes to effectively employ machine learning models and scale AI use cases. When many businesses start their journey into ML and AI, it’s common to place a lot of energy and focus on the coding and data science algorithms themselves. While it’s important to have the in-house data science expertise and the ML experts on-hand to build and test models, the reality is that the actual data s

article thumbnail

Announcing ksqlDB 0.14.0

Confluent

We’re pleased to announce ksqlDB 0.14, one of the most feature-packed releases of the year. This version includes expanded query support over materialized views, incremental schema alteration, variable substitution, additional […].

Process 52
article thumbnail

Build Internal Apps in Minutes with Retool and Rockset: A Customer 360 Example

Rockset

Rockset and Retool are teaming up to help you build internal apps in minutes. Rockset allows developers to turn complex analytics into data APIs simply, while Retool delivers the UI building blocks to quickly launch high-performance internal apps. Together, they empower developers to build performant internal tools, such as customer 360 and logistics monitoring apps, by solely using data APIs and pre-built UI components.

article thumbnail

Path to Profitability with More Agile Pricing

Teradata

Due to slowly crawling data processes, tedious rate filing reviews and gaining consensus among stakeholders, R&D actuaries & data scientists require a new level of pricing agility to be competitive.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Top 4 Reasons Why You Should Upgrade Your Stream Processing Workloads To CDP

Cloudera

If there’s one thing enterprises have learned in 2020, it’s how to navigate through uncertain times, and in 2021, organizations will likely have to continue navigating through a shifting landscape. One trend that we’ve seen this year, is that enterprises are leveraging streaming data as a way to traverse through unplanned disruptions, as a way to make the best business decisions for their stakeholders. .

Process 75
article thumbnail

Netflix at MIT CODE 2020

Netflix Tech

Martin Tingley In November, Netflix was a proud sponsor of the 2020 Conference on Digital Experimentation (CODE), hosted by the MIT Initiative on the Digital Economy. As well as providing sponsorship, Netflix data scientists were active participants, with three contributions. Eskil Forsell and colleagues presented a poster describing Success stories from a democratized experimentation platform.

Coding 65
article thumbnail

HBase Performance testing using YCSB

Cloudera

When running any performance benchmarking tool on your cluster, a critical decision is always what data set size should be used for a performance test, and here we demonstrate why it is important to select a “good fit” data set size when running a HBase performance test on your cluster. The HBase cluster configurations and the size of data set can vary the performance of your workload and the test results on the same cluster.

article thumbnail

Improving your Customer Centric Merchandising with Location based in-Store Merchandising

Cloudera

With any transformation in industry or marketplace, there are leaders and losers. The winners know the fundamental pillars that are hidden to some and evident to others that drive and enable success. In 2020, where connected consumers and the turmoil with the pandemic driven supply chains are driving more and more of retail’s response, at Cloudera we believe that the underlying foundation to retail’s success is based upon real-time and streaming data from retail’s edge – the retail s

Retail 60
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

The role of data in COVID-19 vaccination record keeping

Cloudera

The role of data in COVID-19 vaccination record keeping. Now that the Pfizer vaccine has been approved by the FDA for use in the US, and the Moderna vaccine likely isn’t far behind, we are now on the verge of being able to emerge from the social distancing world that began earlier in 2020. Recent news has talked about distributing a vaccination record card to everyone who gets a COVID-19 vaccine. .

article thumbnail

Bringing transaction support to Cloudera Operational Database

Cloudera

We’re excited to share that after adding ANSI SQL, secondary indices, star schema, and view capabilities to Cloudera’s Operational Database , we will be introducing distributed transaction support in the coming months. . What is ACID? The ACID model of database design is one of the most important concepts in databases. ACID stands for atomicity, consistency, isolation, and durability.