September, 2020

article thumbnail

How Real-Time Stream Processing Works with ksqlDB, Animated

Confluent

ksqlDB, the event streaming database, is becoming one of the most popular ways to work with Apache Kafka®. Every day, we answer many questions about the project, but here’s a […].

Process 145
article thumbnail

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

Cloudera delivers an enterprise data cloud that enables companies to build end-to-end data pipelines for hybrid cloud, spanning edge devices to public or private cloud, with integrated security and governance underpinning it to protect customers data. Cloudera has found that customers have spent many years investing in their big data assets and want to continue to build on that investment by moving towards a more modern architecture that helps leverage the multiple form factors.

Cloud 132
article thumbnail

Data Engineering Project: Stream Edition

Start Data Engineering

Table of Contents Table of Contents Introduction Project description and requirements Infrastructure overview Apache Flink Apache Kafka Design Detect fraudulent accounts Log account actions Prerequisites Code Defining dependencies Inheritance Server logs generator Defining data flow in Apache Flink Create a streaming environment Creating a consumer to read events from Apache Kafka Detecting fraud and generating alert events Writing server logs to a PostgreSQL DB Fraud detection logic Open proces

article thumbnail

Where to start if you want to become a Data Engineer

Team Data Science

"Where can I start if I want to become a Data Engineer?" This is a question I have heard many times before. My answer to it is actually always the same: Start doing a Data Engineering project! Choose a tool Your first step here should be to select a tool. Then start with that tool and then build the whole thing up. So you get some data and then start with a tool.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Edgar: Solving Mysteries Faster with Observability

Netflix Tech

Edgar helps Netflix teams troubleshoot distributed systems efficiently with the help of a summarized presentation of request tracing, logs, analysis, and metadata. by Elizabeth Carretto Everyone loves Unsolved Mysteries. There’s always someone who seems like the surefire culprit. There’s a clear motive, the perfect opportunity, and an incriminating footprint left behind.

Metadata 119
article thumbnail

Five Steps Towards Delivering Better Analytic Outcomes

Teradata

Get tips on how to cast a more critical eye on the seemingly endless amount of data-driven conclusions presented to us. Learn more.

Data 106

More Trending

article thumbnail

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Cloudera

Performance is one of the key, if not the most important deciding criterion, in choosing a Cloud Data Warehouse service. In today’s fast changing world, enterprises have to make data driven decisions quickly and for that they rely heavily on their data warehouse service. . In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to Microsoft HDInsight (also powered by Apache Hive-LLAP) on Azure using the TPC-DS 2.9 benchmark.

article thumbnail

Speed Up And Simplify Your Streaming Data Workloads With Red Panda

Data Engineering Podcast

Summary Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. Despite its widespread popularity, there are numerous accounts of the difficulty that operators face in keeping it reliable and performant, or trying to scale an installation. To make the benefits of the Kafka ecosystem more accessible and reduce the operational burden, Alexander Gallego and his team at Vectorized created the Red Panda engine.

Kafka 100
article thumbnail

Why you should not learn everything in Data Science

Team Data Science

"Since I started exploring Data Engineering, it has been overwhelming. In the end I have the feeling of giving up." This is a message that reached me from a viewer on YouTube. And that's exactly how I feel sometimes! Sometimes I feel a bit overwhelmed by the whole thing. Because there is so much going on. All the technology and Data Science hype. There is always something new on the horizon.

article thumbnail

Analytics at Netflix: Who we are and what we do

Netflix Tech

Analytics at Netflix: Who We Are and What We Do An Introduction to Analytics and Visualization Engineering at Netflix by Molly Jackman & Meghana Reddy Explained: Season 1 (Photo Credit: Netflix) Across nearly every industry, there is recognition that data analytics is key to driving informed business decision-making. But there is far less agreement on what that term “data analytics” actually means?

BI 97
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Accelerate Your Path to a Modern Analytics Architecture

Teradata

A modern analytics architecture means something different to everyone. What does it mean for your organization? Find out more.

article thumbnail

Streaming Data from Apache Kafka into Azure Data Explorer with Kafka Connect

Confluent

Near-real-time insights have become a de facto requirement for Azure use cases involving scalable log analytics, time series analytics, and IoT/telemetry analytics. Azure Data Explorer (also called Kusto) is the […].

Kafka 139
article thumbnail

Fundamentals for Success in Cloud Data Management

Cloudera

Everybody needs more data and more analytics, with so many different and sometimes often conflicting needs. Data engineers need batch resources, while data scientists need to quickly onboard ephemeral users. Data architects deal with constantly evolving workloads and business analysts must balance the urgency and importance of a concurrent user population that continues to grow.

article thumbnail

Cutting Through The Noise And Focusing On The Fundamentals Of Data Engineering With The Data Janitor

Data Engineering Podcast

Summary Data engineering is a constantly growing and evolving discipline. There are always new tools, systems, and design patterns to learn, which leads to a great deal of confusion for newcomers. Daniel Molnar has dedicated his time to helping data professionals get back to basics through presentations at conferences and meetups, and with his most recent endeavor of building the Pipeline Data Engineering Academy.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Important countries and regions with Data Science demand

Team Data Science

In which regions or countries is there a boom in the field of Data Sciences and thus a large number of jobs? This is a very interesting question, which newcomers or graduates often ask themselves. Maybe you have already asked yourself this question? The USA as an advanced country Companies in the USA are obviously very, very advanced with Data Science.

article thumbnail

Seamlessly Swapping the API backend of the Netflix Android app

Netflix Tech

How we migrated our Android endpoints out of a monolith into a new microservice by Rohan Dhruva , Ed Ballot As Android developers, we usually have the luxury of treating our backends as magic boxes running in the cloud, faithfully returning us JSON. At Netflix, we have adopted the Backend for Frontend (BFF) pattern : instead of having one general purpose “backend API”, we have one backend per client (Android/iOS/TV/web).

Java 97
article thumbnail

The Cause and Effect of Supply Chain Fragility, and How to Fix It

Teradata

The fragility of your supply chain existed long before COVID-19 brought it into sharp relief. Discover the secret to true supply chain resilience.

IT 105
article thumbnail

Creating a Serverless Environment for Testing Your Apache Kafka Applications

Confluent

If you are taking your first steps with Apache Kafka®, looking at a test environment for your client application, or building a Kafka demo, there are two “easy button” paths […].

Kafka 133
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Cloudera

For enterprise organizations, managing and operationalizing increasingly complex data across the business has presented a significant challenge for staying competitive in analytic and data science driven markets. With growing disparate data across everything from edge devices to individual lines of business needing to be consolidated, curated, and delivered for downstream consumption, it’s no wonder that data engineering has become the most in-demand role across businesses — growing at an estima

article thumbnail

Distributed In Memory Processing And Streaming With Hazelcast

Data Engineering Podcast

Summary In memory computing provides significant performance benefits, but brings along challenges for managing failures and scaling up. Hazelcast is a platform for managing stateful in-memory storage and computation across a distributed cluster of commodity hardware. On top of this foundation, the Hazelcast team has also built a streaming platform for reliable high throughput data transmission.

Process 100
article thumbnail

Scala 3: Traits Quickly Explained

Rock the JVM

This article delves into Scala 3's advanced trait functionalities, building on our previous explorations of the language's new features

Scala 52
article thumbnail

How Our Paths Brought Us to Data and Netflix

Netflix Tech

Part of our series on who works in Analytics at Netflix?—?and what the role entails by Julie Beckley & Chris Pham This Q&A provides insights into the diverse set of skills, projects, and culture within Data Science and Engineering (DSE) at Netflix through the eyes of two team members: Chris Pham and Julie Beckley. Photo from a team curling offsite?

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

The Power of Data and Analytic Processing Gravity

Teradata

Taking the definition of physical gravity & extending it to data analytics, we explore the opportunities to combine data gravity with analytic processing, at scale with Vantage.

Process 93
article thumbnail

ksqlDB 0.12.0 Introduces Real-Time Query Upgrades and Automatic Query Restarts

Confluent

The ksqlDB team is pleased to announce ksqlDB 0.12.0. This release continues to improve upon the usability of ksqlDB and aims to reduce administration time. Highlights include query upgrades, which […].

Process 98
article thumbnail

Data Champions: Balancing IT and Business Needs

Cloudera

Digital transformation has been on the agenda for a long time, but the sudden need to respond to the unprecedented challenges of 2020, has meant the buzzword has become an executable reality for many enterprises. I recently came across a KPMG report that revealed that 80% of executives are increasing investments on emerging technologies now, to drive higher realized value in the future.

IT 105
article thumbnail

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Data Engineering Podcast

Summary Databases are limited in scope to the information that they directly contain. For analytical use cases you often want to combine data across multiple sources and storage locations. This frequently requires cumbersome and time-consuming data integration. To address this problem Martin Traverso and his colleagues at Facebook built the Presto distributed query engine.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Enums in Scala 3: Quickly Explained

Rock the JVM

Scala 3 Introduces Enums: A Major Update with Significant Implications

Scala 52
article thumbnail

HDMI?—?Scaling Netflix Certification

Netflix Tech

HDMI?—?Scaling Netflix Certification Scott Bolter , Matthew Lehman , Akshay Garg ¹ At Netflix, we take the task of preserving the creative vision of our content all the way to a subscriber TV screen very seriously. This significantly increases the scope of our application integration and certification processes for streaming devices like set-top-boxes (STBs) and TVs.

article thumbnail

The Game Has Changed for Retail – or Has it?

Teradata

The game had changed for the retail sector long ago – but it has taken the COVID-19 crisis for people to notice. A new appreciation for the role of data in retail has emerged.

Retail 93
article thumbnail

Using the Fully Managed MongoDB Atlas Connector in a Secure Environment

Confluent

Since the MongoDB Atlas source and sink became available in Confluent Cloud, we’ve received many questions around how to set up these connectors in a secure environment. By default, MongoDB […].

MongoDB 97
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.