February, 2019

article thumbnail

Spring for Apache Kafka Deep Dive – Part 1: Error Handling, Message Conversion and Transaction Support

Confluent

Following on from How to Work with Apache Kafka in Your Spring Boot Application , which shows how to get started with Spring Boot and Apache Kafka ® , here we’ll dig a little deeper into some of the additional features that the Spring for Apache Kafka project provides. Spring for Apache Kafka brings the familiar Spring programming model to Kafka. It provides the KafkaTemplate for publishing records and a listener container for asynchronous execution of POJO listeners.

Kafka 110
article thumbnail

Protecting a Story’s Future with History and Science

Netflix Tech

By Kylee Peña, Chris Clark, and Mike Whipple Kylee’s parents after their wedding in 1978. I?—?Kylee?—?have two photos from my parents’ wedding. Just two. This year they celebrated 40 years of marriage, so both photos were shot on film. Both capture a joy and awkwardness that come with young weddings. They’re fresh and full of life, candid captures from another era.

article thumbnail

Deep Learning For Data Engineers

Data Engineering Podcast

Summary Deep learning is the latest class of technology that is gaining widespread interest. As data engineers we are responsible for building and managing the platforms that power these models. To help us understand what is involved, we are joined this week by Thomas Henson. In this episode he shares his experiences experimenting with deep learning, what data engineers need to know about the infrastructure and data requirements to power the models that your team is building, and how it can be u

article thumbnail

Managing Uber’s Data Workflows at Scale

Uber Engineering

At Uber’s scale, thousands of microservices serve millions of rides and deliveries a day, generating more than a hundred petabytes of raw data. Internally, engineering and data teams across the company leverage this data to improve the Uber experience. … The post Managing Uber’s Data Workflows at Scale appeared first on Uber Engineering Blog.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Cash Is Still King – Make Sure Your Business Is Prepared for the Next Recession

Teradata

If your organization understands customer profitability in detail, then your organization can easily navigate through a recession.

80
article thumbnail

Introducing Cloudera DataFlow (CDF)

Cloudera

Late last year, the news of the merger between Hortonworks and Cloudera shook the industry and gave birth to the new Cloudera – the combined company with a focus on being an Enterprise Data Cloud leader and a product offering that spans from edge to AI. One of the most promising technology areas in this merger that already had a high growth potential and is poised for even more growth is the Data-in-Motion platform called Hortonworks DataFlow (HDF).

More Trending

article thumbnail

Building a Cross-platform In-app Messaging Orchestration Service

Netflix Tech

George Abraham , Devika Chawla , Chris Beaumont , and Daniel Huang. Thoughtful, relevant, and timely messaging is an integral part of a customer’s Netflix experience. The Netflix Messaging Engineering team builds the platform and the messages to communicate with Netflix customers. Messages in the Netflix App In-app messages at Netflix fall broadly into two channels?

article thumbnail

Speed Up Your Analytics With The Alluxio Distributed Storage System

Data Engineering Podcast

Summary Distributed storage systems are the foundational layer of any big data stack. There are a variety of implementations which support different specialized use cases and come with associated tradeoffs. Alluxio is a distributed virtual filesystem which integrates with multiple persistent storage systems to provide a scalable, in-memory storage layer for scaling computational workloads independent of the size of your data.

Systems 100
article thumbnail

How to Run SQL on PDF Files

Rockset

PDFs are the de facto standard for distributing and sharing fixed-layout documents today. A quick survey of my laptop folders reveals account statements, receipts, technical papers, book chapters, and presentation slides—all PDFs. Lots of valuable information finds its way into all manner of PDF files. Which is a great reason for Rockset to support SQL queries on PDF files, in our mission to make data more usable to everyone.

SQL 52
article thumbnail

The First Mistake of a CDO: Proposing Business Value

Teradata

Kevin Lewis explains the role of chief of data officer.

Data 56
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How ATB Financial is Utilizing Hybrid Cloud to Reduce the Time to Value for Big Data Analytics by 90 Percent

Cloudera

ATB Financial is Alberta’s largest home grown financial institution, and prides itself on its customer obsession, putting the over 750,000 Albertans at the centre of all that they do. As a result, ATB is constantly transforming in order to ensure it can continue to deliver unparalleled value to Albertans. A key pillar in the transformation journey is focused on robust data operations that can help ATB deliver timely, relevant and delightful service.

article thumbnail

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

Building a scalable, reliable and performant machine learning (ML) infrastructure is not easy. It takes much more effort than just building an analytic model with Python and your favorite machine learning framework. After all, machine learning with Python requires the use of algorithms that allow computer programs to constantly learn, but building that infrastructure is several levels higher in complexity.

article thumbnail

Extending Vector with eBPF to inspect host and container performance

Netflix Tech

by Jason Koch , with Martin Spier , Brendan Gregg , Ed Hunter Improving the tools available to our engineers to help them diagnose, triage, and work through software performance challenges in the cloud is a key goal for the cloud performance engineering team at Netflix. Today we are excited to announce latency heatmaps and improved container support for our on-host monitoring solution?

article thumbnail

Machine Learning In The Enterprise

Data Engineering Podcast

Summary Machine learning is a class of technologies that promise to revolutionize business. Unfortunately, it can be difficult to identify and execute on ways that it can be used in large companies. Kevin Dewalt founded Prolego to help Fortune 500 companies build, launch, and maintain their first machine learning projects so that they can remain competitive in our landscape of constant change.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

What Is Readable Code?

Pandora Engineering

Code creates interfaces. But code itself is also an interface.

Coding 52
article thumbnail

What Lessons Can Apollo 13 Teach Us About Analytics?

Teradata

Tom Casey explains lessons from the Apollo 13 program and how they can be applied to day to day dealings in the analytics world.

article thumbnail

Cloudera announces support for Azure’s next-generation Data Lake Store

Cloudera

Today we are proud to announce our support for ADLS Gen2 as it enters general availability on Microsoft Azure. CDH 6.1 already includes support for MapReduce and Spark jobs, Hive and Impala queries, and Oozie workflows on ADLS Gen2. The Cloudera platform delivers a one-stop shop that allows you to store any kind of data, process and analyze it in many different ways in a single environment, and integrate with the rest of your data infrastructure.

article thumbnail

Journey to Event Driven – Part 2: Programming Models for the Event-Driven Architecture

Confluent

Part 1 of this series discussed why you need to embrace event-first thinking, while this article builds a rationale for different styles of event-driven architectures and compares and contrasts scaling, persistence and runtime models. Once settled on the event streaming approach, I’ll provide a high-level dataflow of how we design systems for payment processing at scale using this approach.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Engineering to Improve Marketing Effectiveness (Part 3)?—?Scaling Paid Media campaigns

Netflix Tech

Engineering to Improve Marketing Effectiveness (Part 3)?—?Scaling Paid Media campaigns This is the third blog of the series on Marketing Technology at Netflix. This blog focuses on the marketing tech systems that are responsible for campaign setup and delivery of our paid media campaigns. The first blog focused on solving for creative development and localization at scale.

Media 55
article thumbnail

Cleaning And Curating Open Data For Archaeology

Data Engineering Podcast

Summary Archaeologists collect and create a variety of data as part of their research and exploration. Open Context is a platform for cleaning, curating, and sharing this data. In this episode Eric Kansa describes how they process, clean, and normalize the data that they host, the challenges that they face with scaling ETL processes which require domain specific knowledge, and how the information contained in connections that they expose is being used for interesting projects.

article thumbnail

How to Build a Facebook Messenger Chatbot Powered by Fast SQL on CSV

Rockset

A chatbot, like any human customer service rep, needs data about your business and products in order to respond to customers with the correct information. What is an efficient way to hook up your data to a chat application without significant data engineering? In this blog, I will demonstrate how you can build a Facebook Messenger chatbot to help users find vacation rentals using CSV data on Airbnb rentals.

SQL 40
article thumbnail

Is There Such a Thing as Too Much Parallelism?

Teradata

In her blog, Carrie Ballinger discusses parallelism and how you can fashion it to specific needs by using the new sparse map capability

IT 45
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Governing for digital transformation and growth

Cloudera

Ask a CIO where their focus lies and ‘digital transformation’ as well as ‘growth’ will come into the conversation quite quickly. The former sees growing investment in data analytics to become data-driven (45% of organizations expect to increase their spending in this area) while the latter is fueled by disruptive technology and the adoption of AI (41% of organizations name it as their game changer).

article thumbnail

Kafka Connect Deep Dive – JDBC Source Connector

Confluent

One of the most common integrations that people want to do with Apache Kafka ® is getting data in from a database. That is because relational databases are a rich source of events. The existing data in a database, and any changes to that data, can be streamed into a Kafka topic. From there these events can be used to drive applications, be streamed to other data stores such as search replicas or caches and streamed to storage for analytics.

Kafka 90
article thumbnail

How to Make Space for Research & Innovation?

Zalando Engineering

Redesigning research and product development so that the explorative nature of data science becomes a driver for innovation Zalando leverages cutting edge machine learning technologies to be Europe’s leading online platform for fashion and lifestyle. In order to develop these products, data scientists and product roles have to work together closely.

article thumbnail

All About the Kafka Connect Neo4j Sink Plugin

Confluent

Only a little more than one month after the first release, we are happy to announce another milestone for our Kafka integration. Today, you can grab the Kafka Connect Neo4j Sink from Confluent Hub. . Neo4j extension – Kafka sink refresher. We’ve been using the work we did for the Kafka sink – Neo4j extension and have made it available via remote connections over our binary bolt protocol.

Kafka 88
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Sysmon Security Event Processing in Real Time with KSQL and HELK

Confluent

During a recent talk titled Hunters ATT&CKing with the Right Data , which I presented with my brother Jose Luis Rodriguez at ATT&CKcon, we talked about the importance of documenting and modeling security event logs before developing any data analytics while preparing for a threat hunting engagement. Defining relationships among Windows security event logs such as Sysmon , for example, helped us to appreciate the extra context that two or more events together can provide for a hunt.

Process 83
article thumbnail

Improving Stream Data Quality with Protobuf Schema Validation

Confluent

The requirements for fast and reliable data pipelines are growing quickly at Deliveroo as the business continues to grow and innovate. We have delivered an event streaming platform which gives strong guarantees on data quality, using Apache Kafka ® and Protocol Buffers. Just some of the ways in which we make use of data at Deliveroo include computing optimal rider assignments to in-flight orders, making live operational decisions, personalising restaurant recommendations to users, and prioritisi

Kafka 81
article thumbnail

Kafka Summit 2019: 3 Big Things!

Confluent

How many Kafka Summits should there be in a year? Experts disagree. Some say there should be one giant event where everybody gathers at once. Some say there should be one once a month in different regions of the world. Others say you should live every day like it’s Kafka Summit. As you may know, we have adopted a happy medium: three Summits in 2019.

Kafka 66
article thumbnail

Cloudera’s and Hortonworks’ data platform in the cloud named among Leaders in new Forrester Wave

Cloudera

When Cloudera was formed about 10 years ago, the founders believed that companies would jump at the chance to store, manage, and analyze their data in the cloud. Thus, they came up with the name Cloudera, which was a play on “era of cloud.” But, much to their surprise, companies weren’t ready for cloud; they were more focused with on-prem. So, Cloudera focused on helping companies with storing, managing, and analyzing data on-prem.

Cloud 60
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.