Data Architecture and Kafka - Data Engineering Digest

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Whether it’s unifying transactional and analytical data with Hybrid Tables, improving governance for an open lakehouse with Snowflake Open Catalog or enhancing threat detection and monitoring with Snowflake Horizon Catalog , Snowflake is reducing the number of moving parts to give customers a fully managed service that just works.

Data Architecture

Data Architecture Architecture Data Lake Kafka

They Handle 500B Events Daily. Here’s Their Data Engineering Architecture.

Monte Carlo

NOVEMBER 12, 2024

A data engineering architecture is the structural framework that determines how data flows through an organization – from collection and storage to processing and analysis. It’s the big blueprint we data engineers follow in order to transform raw data into valuable insights.

Architecture

Architecture Data Engineering Data Engineer Engineering

How Marriott Modernized Their Data Architecture with Snowflake

Snowflake

SEPTEMBER 14, 2023

More than 50% of data leaders recently surveyed by BCG said the complexity of their data architecture is a significant pain point in their enterprise. As a result,” says BCG, “many companies find themselves at a tipping point, at risk of drowning in a deluge of data, overburdened with complexity and costs.”

Data Architecture

Data Architecture Architecture Hadoop Data Warehouse

How to Build a Scalable Data Architecture with Apache Kafka

KDnuggets

APRIL 5, 2023

Learn about Apache Kafka architecture and its implementation using a real-world use case of a taxi booking app.

Kafka

Kafka Architecture Data Architecture Building

Building Streaming Data Architectures with Qlik Replicate and Apache Kafka

Confluent

OCTOBER 30, 2020

A fundamental challenge with today’s “data explosion” is finding the best answer to the question, “So where do I put my data?” while avoiding the longer-term problem of data warehouses, […].

Data Architecture

Data Architecture Architecture Kafka Building

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Confluent

JULY 17, 2019

Together, MongoDB and Apache Kafka ® make up the heart of many modern data architectures today. Integrating Kafka with external systems like MongoDB is best done though the use of Kafka Connect. The official MongoDB Connector for Apache Kafka is developed and supported by MongoDB engineers. Getting started.

MongoDB

MongoDB Kafka Database Medical

Apache Kafka Vs Apache Spark: Know the Differences

Knowledge Hut

MAY 3, 2024

A new breed of ‘Fast Data’ architectures has evolved to be stream-oriented, where data is processed as it arrives, providing businesses with a competitive advantage. Dean Wampler (Renowned author of many big data technology-related books) Dean Wampler makes an important point in one of his webinars.

Kafka

Kafka Scala Java Amazon Web Services

The Business Value of the DSP: Part 1 – From Apache Kafka® to a DSP

Confluent

MARCH 13, 2025

Discover how Confluent transformed from a self-managed Kafka solution into a fully managed data streaming platform and learn what this evolution means for modern data architecture.

Kafka

Kafka Data Architecture Architecture Management

Why I Can’t Wait for Kafka Summit San Francisco

Confluent

JULY 23, 2019

The Kafka Summit Program Committee recently published the schedule for the San Francisco event, and there’s quite a bit to look forward to. I remember two to three years back, I spent all my time listening to talks about various ETL architectures in the Pipelines track. Interests evolve over time too. What’s the Time?…and

Kafka

Kafka Hadoop Media Software Engineer

Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection

Data Engineering Podcast

JULY 1, 2019

Scaling the volume of events that can be processed in real-time can be challenging, so Paul Brebner from Instaclustr set out to see how far he could push Kafka and Cassandra for this use case. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference.

Kafka

Kafka Finance Media Architecture

IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka

Cloudera

SEPTEMBER 26, 2023

Learn more about how you can benefit from a well-supported data management platform and ecosystem of products, services and support by visiting the IBM and Cloudera partnership page. The post IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka appeared first on Cloudera Blog.

Kafka

Kafka Technology IT Government

Data Engineering Weekly #209

Data Engineering Weekly

FEBRUARY 23, 2025

It allows different data platforms to access and share the same underlying data without copying, treating OTFs as a storage-layer abstraction. link] Sponsored: Webinar - The State of Airflow 2025 We asked 5,000+ data engineers how Airflow is shaping the modern DataOps landscape.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

The customer also wanted to utilize the new features in CDP PvC Base like Apache Ranger for dynamic policies, Apache Atlas for lineage, comprehensive Kafka streaming services and Hive 3 features that are not available in legacy CDH versions. Lineage and chain of custody, advanced data discovery and business glossary. Kafka, SRM, SMM.

Cloud

Cloud Kafka Professional Services Metadata

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

Key Differences Between AI Data Engineers and Traditional Data Engineers While traditional data engineers and AI data engineers have similar responsibilities, they ultimately differ in where they focus their efforts.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

KDnuggets News, April 12: Top 19 Skills for a Data Scientist in 2023 • 8 ChatGPT Open-Source Alternatives

KDnuggets

APRIL 12, 2023

Top 19 Skills You Need to Know in 2023 to Be a Data Scientist • 8 Open-Source Alternative to ChatGPT and Bard • Free eBook: 10 Practical Python Programming Tricks • DataLang: A New Programming Language for Data Scientists… Created by ChatGPT? • How to Build a Scalable Data Architecture with Apache Kafka

Programming Language

Programming Language Kafka Data Architecture Python

Building a Scalable Search Architecture

Confluent

JUNE 18, 2019

Distributed transactions are very hard to implement successfully, which is why we’ll introduce a log-inspired system such as Apache Kafka ®. Building an indexing pipeline at scale with Kafka Connect. Moving data into Apache Kafka with the JDBC connector. Setting up the connector.

Architecture

Architecture Building Kafka Database-centric

Kafka vs Kinesis: How to Choose

Rockset

AUGUST 16, 2022

Streams for Everyone If you have come this far it means you have already considered or are considering using event streaming in your data architecture for the wide variety of benefits it can offer. Or perhaps you are looking for something to support a Data Mesh initiative because that’s all the rage right now.

Kafka

Kafka AWS Cloud Java

Better to Be Wrong Than Vague: Apache Kafka and Data Architecture Predictions for 2021

Confluent

JANUARY 19, 2021

On a recent episode of Streaming Audio, Gwen Shapira, Michael Noll, and Ben Stopford joined me to hold forth about the near future of Apache Kafka® and software architecture in […].

Kafka

Kafka Architecture Data Architecture Data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Architecture

Architecture Metadata Kafka Government

Data Engineering: A Formula 1-inspired Guide for Beginners

Towards Data Science

DECEMBER 4, 2023

Anyways, I wasn’t paying enough attention during university classes, and today I’ll walk you through data layers using — guess what — an example. Business Scenario & Data Architecture Imagine this: next year, a new team on the grid, Red Thunder Racing, will call us (yes, me and you) to set up their new data infrastructure.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Thoughts on Amazon Express One and its impact in Data Infrastructure

Data Engineering Weekly

DECEMBER 2, 2023

The Current State of the Data Architecture S3 intelligent tiered storage provides a fine balance between the cost and the duration of the data retention. However, the real-time insight on accessing the recent data remains a big challenge. Previously, we even tried to query Kafka directly using Presto-Kafka Connector.

IT

IT BI AWS Kafka

Schemas, Contracts, and Compatibility

Confluent

MAY 21, 2019

The profile service will publish the changes in profiles, including address changes to an Apache Kafka ® topic, and the quote service will subscribe to the updates from the profile changes topic, calculate a new quote if needed and publish the new quota to a Kafka topic so other services can subscribe to the updated quote event.

Kafka

Kafka Insurance Architecture Database

Sovereign AI, Redpanda vs Apache Kafka, The Future of Data Streaming with Alex Gallego (CEO of Redpanda)

Striim

AUGUST 5, 2024

This episode promises invaluable insights into the shift from batch to real-time data processing, and the practical applications across multiple industries that make this transition not just beneficial but necessary. Explore the intricate challenges and groundbreaking innovations in data storage and streaming.

Kafka

Kafka Data Storage Architecture Data Architecture

Cloudera Named Strong Performer in New Forrester Wave for Streaming Platforms

Cloudera

FEBRUARY 2, 2024

Cloudera offers the most complete set of end-to-end capabilities to capture, process, and distribute any data anywhere across environments.

Kafka

Kafka Architecture Data Architecture Cloud

Creating an IoT-Based, Data-Driven Food Value Chain with Confluent Cloud

Confluent

APRIL 25, 2019

By capturing Internet of Things (IoT) event data from farm to fork with Apache Kafka ® and Confluent Cloud, BAADER is increasing its value as part of this chain, creating new business opportunities and enabling its partners to optimize their operations. “By “With Confluent Cloud, we get more than just a Kafka service.

Food

Food Cloud Kafka Transportation

Snowflake’s AWS re:Invent Highlights for Fast-Tracking ML, Gen AI and Application Innovations

Snowflake

DECEMBER 5, 2023

This lets them leverage the familiar development interface of a notebook while directing complex data preparation and feature engineering steps to run in Snowflake (rather than having to copy and manage copies of data inside their notebook instance).

AWS

AWS Amazon Web Services Government Cloud Computing

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

This specialist works closely with people on both business and IT sides of a company to understand the current needs of the stakeholders and help them unlock the full potential of data. To get a better understanding of a data architect’s role, let’s clear up what data architecture is.

Data Architect

Data Architect Certification Generalist Big Data

5 Key Takeaways from Flink Forward 2023

Cloudera

NOVEMBER 27, 2023

2: The majority of Flink shops are in earlier phases of maturity We talked to numerous developer teams who had migrated workloads from legacy ETL tools, Kafka streams, Spark streaming, or other tools for the efficiency and speed of Flink. Organizations are moving beyond a Kafka-is-everything mentality when it comes to streaming.

Kafka

Kafka SQL ETL Tools Architecture

SnowflakeDB: The Data Warehouse Built For The Cloud

Data Engineering Podcast

DECEMBER 8, 2019

Contact Info LinkedIn Website @KentGraziano on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?

Data Warehouse

Data Warehouse Cloud AWS Relational Database

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery.

Medical

Medical Banking Telecommunication Government

Straining Your Data Lake Through A Data Mesh

Data Engineering Podcast

JULY 22, 2019

We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Upcoming events include the O’Reilly AI Conference, the Strata Data Conference, and the combined events of the Data Architecture Summit and Graphorum.

Data Lake

Data Lake Hadoop Data Architecture

Navigating Boundless Data Streams With The Swim Kernel

Data Engineering Podcast

SEPTEMBER 18, 2019

We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona.

Hadoop

Hadoop Data Lake BI Kafka

Fast Analytics On Semi-Structured And Structured Data In The Cloud

Data Engineering Podcast

OCTOBER 7, 2019

We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC.

Structured Data

Structured Data Cloud SQL Programming Language

Reflections on Event Streaming as Confluent Turns Five – Part 2

Confluent

SEPTEMBER 19, 2019

When people ask me the very top-level question “why do people use Kafka,” I usually lead with the story in my last post , where I talked about how Apache Kafka ® is helping us deliver on the promises the cloud made to us a decade ago. But I follow it up quickly with a second and potentially unrelated pattern: real-time data pipelines.

Kafka

Kafka Data Pipeline Bytes Data Architect

Keeping Your Data Warehouse In Order With DataForm

Data Engineering Podcast

OCTOBER 14, 2019

We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC.

Data Warehouse

Data Warehouse PostgreSQL AWS Programming Language

Build Maintainable And Testable Data Applications With Dagster

Data Engineering Podcast

OCTOBER 28, 2019

We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC.

Building

Building Data Pipeline Programming Language Kafka

Automating Your Production Dataflows On Spark

Data Engineering Podcast

NOVEMBER 4, 2019

We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC.

Programming Language

Programming Language Kafka Data Engineering Data Engineer

Bringing Financial Services Business Use Cases to Life: Leveraging Data Analytics, ML/AI, and Gen AI

Cloudera

MAY 30, 2024

They deployed a proof-of-concept version of CDP Private Cloud and CDP Public Cloud, facilitating the client’s exploration of Cloudera’s hybrid cloud functionalities and a new data model. What are some of the business use cases financial services customers are focused on to use AI?

Data Analytics

Data Analytics Banking Insurance Finance

Building Real-Time Data Architectures to Foster Innovation

Rockset

JULY 21, 2020

90% of the apps in the world can be built on real-time data services. 90% of the features in your app can be built on real-time data services. Highly consistent services are highly expensive. Embrace real-time services.

Data Architecture

Data Architecture Architecture Building Data

Data Engineering Weekly #160

Data Engineering Weekly

FEBRUARY 25, 2024

My challenge with Samza during my time at Slack is the decision to co-locate Samza's state in Kafka. At that time, operating Kafka comes with its challenges. Samza’s stream-stream join relies on Kafka’s key partition to shift the streaming operation burden to Kafka.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Netflix Tech

NOVEMBER 14, 2023

At Netflix, our backend microservices continuously generate real-time event data that gets streamed into Kafka. These raw events are the source of various data processing workflows within our team. We ingest this diverse event data and transform it into standardized fact tables.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Data Engineering Weekly #140

Data Engineering Weekly

JULY 30, 2023

😄🎢🚀 High Scalability: Lessons Learned Running Presto At Meta Scale Presto, potentially ranking as one of the most influential open-source initiatives of the past ten years, stands shoulder to shoulder with the likes of Apache Kafka. DuckDB brings an exciting data architecture challenge to the industry.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Cloudera

AUGUST 21, 2020

With hundreds of thousands of data points or endpoints or inputs, companies today have a deluge of data and in order to be able to handle that and distribute it to other applications that need that data in real-time, a solution like Apache Kafka can help distribute it to all the other applications.

Banking

Banking Kafka Cloud Storage Government

Scala For Big Data Engineering – Why should you care?

Advancing Analytics: Data Engineering

APRIL 23, 2020

Other popular software/frameworks written Scala include Kafka, akka and play. A great quote I read, though somewhat dramatic, articulates this nicely: “ Scala has taken over the world of ‘Fast’ Data ”. An example of how popular Scala based Software can be used within your data architecture is illustrated below.

Scala

Scala Big Data Data Engineering Data Engineer

Simplifying Data Architecture and Security to Accelerate Value

They Handle 500B Events Daily. Here’s Their Data Engineering Architecture.

Trending Sources

How Marriott Modernized Their Data Architecture with Snowflake

How to Build a Scalable Data Architecture with Apache Kafka

Building Streaming Data Architectures with Qlik Replicate and Apache Kafka

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Apache Kafka Vs Apache Spark: Know the Differences

The Business Value of the DSP: Part 1 – From Apache Kafka® to a DSP

Why I Can’t Wait for Kafka Summit San Francisco

Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection

IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka

Data Engineering Weekly #209

Upgrade Journey: The Path from CDH to CDP Private Cloud

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

KDnuggets News, April 12: Top 19 Skills for a Data Scientist in 2023 • 8 ChatGPT Open-Source Alternatives

Building a Scalable Search Architecture

Kafka vs Kinesis: How to Choose

Better to Be Wrong Than Vague: Apache Kafka and Data Architecture Predictions for 2021

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Data Engineering: A Formula 1-inspired Guide for Beginners

Thoughts on Amazon Express One and its impact in Data Infrastructure

Schemas, Contracts, and Compatibility

Sovereign AI, Redpanda vs Apache Kafka, The Future of Data Streaming with Alex Gallego (CEO of Redpanda)

Cloudera Named Strong Performer in New Forrester Wave for Streaming Platforms

Creating an IoT-Based, Data-Driven Food Value Chain with Confluent Cloud

Snowflake’s AWS re:Invent Highlights for Fast-Tracking ML, Gen AI and Application Innovations

Data Architect: Role Description, Skills, Certifications and When to Hire

5 Key Takeaways from Flink Forward 2023

SnowflakeDB: The Data Warehouse Built For The Cloud

Announcing the 2020 Data Impact Award Winners

Straining Your Data Lake Through A Data Mesh

Navigating Boundless Data Streams With The Swim Kernel

Fast Analytics On Semi-Structured And Structured Data In The Cloud

Reflections on Event Streaming as Confluent Turns Five – Part 2

Keeping Your Data Warehouse In Order With DataForm

Build Maintainable And Testable Data Applications With Dagster

Automating Your Production Dataflows On Spark

Bringing Financial Services Business Use Cases to Life: Leveraging Data Analytics, ML/AI, and Gen AI

Building Real-Time Data Architectures to Foster Innovation

Data Engineering Weekly #160

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Data Engineering Weekly #140

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Scala For Big Data Engineering – Why should you care?

Stay Connected