February, 2019

article thumbnail

Deep Learning For Data Engineers

Data Engineering Podcast

Summary Deep learning is the latest class of technology that is gaining widespread interest. As data engineers we are responsible for building and managing the platforms that power these models. To help us understand what is involved, we are joined this week by Thomas Henson. In this episode he shares his experiences experimenting with deep learning, what data engineers need to know about the infrastructure and data requirements to power the models that your team is building, and how it can be u

article thumbnail

Spring for Apache Kafka Deep Dive – Part 1: Error Handling, Message Conversion and Transaction Support

Confluent

Following on from How to Work with Apache Kafka in Your Spring Boot Application , which shows how to get started with Spring Boot and Apache Kafka ® , here we’ll dig a little deeper into some of the additional features that the Spring for Apache Kafka project provides. Spring for Apache Kafka brings the familiar Spring programming model to Kafka. It provides the KafkaTemplate for publishing records and a listener container for asynchronous execution of POJO listeners.

Kafka 110
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Protecting a Story’s Future with History and Science

Netflix Tech

By Kylee Peña, Chris Clark, and Mike Whipple Kylee’s parents after their wedding in 1978. I?—?Kylee?—?have two photos from my parents’ wedding. Just two. This year they celebrated 40 years of marriage, so both photos were shot on film. Both capture a joy and awkwardness that come with young weddings. They’re fresh and full of life, candid captures from another era.

article thumbnail

Managing Uber’s Data Workflows at Scale

Uber Engineering

At Uber’s scale, thousands of microservices serve millions of rides and deliveries a day, generating more than a hundred petabytes of raw data. Internally, engineering and data teams across the company leverage this data to improve the Uber experience. … The post Managing Uber’s Data Workflows at Scale appeared first on Uber Engineering Blog.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Introducing Cloudera DataFlow (CDF)

Cloudera

Late last year, the news of the merger between Hortonworks and Cloudera shook the industry and gave birth to the new Cloudera – the combined company with a focus on being an Enterprise Data Cloud leader and a product offering that spans from edge to AI. One of the most promising technology areas in this merger that already had a high growth potential and is poised for even more growth is the Data-in-Motion platform called Hortonworks DataFlow (HDF).

article thumbnail

Cash Is Still King – Make Sure Your Business Is Prepared for the Next Recession

Teradata

If your organization understands customer profitability in detail, then your organization can easily navigate through a recession.

80

More Trending

article thumbnail

Journey to Event Driven – Part 3: The Affinity Between Events, Streams and Serverless

Confluent

With serverless being all the rage, it brings with it a tidal change of innovation. Given that it is at a relatively early stage, developers are still trying to grok the best approach for each cloud vendor and often face the following question: Should I go cloud native with AWS Lambda, GCP functions, etc., or invest in a vendor-agnostic layer like the serverless framework ?

Kafka 109
article thumbnail

Building a Cross-platform In-app Messaging Orchestration Service

Netflix Tech

George Abraham , Devika Chawla , Chris Beaumont , and Daniel Huang. Thoughtful, relevant, and timely messaging is an integral part of a customer’s Netflix experience. The Netflix Messaging Engineering team builds the platform and the messages to communicate with Netflix customers. Messages in the Netflix App In-app messages at Netflix fall broadly into two channels?

article thumbnail

How to Run SQL on PDF Files

Rockset

PDFs are the de facto standard for distributing and sharing fixed-layout documents today. A quick survey of my laptop folders reveals account statements, receipts, technical papers, book chapters, and presentation slides—all PDFs. Lots of valuable information finds its way into all manner of PDF files. Which is a great reason for Rockset to support SQL queries on PDF files, in our mission to make data more usable to everyone.

SQL 52
article thumbnail

How ATB Financial is Utilizing Hybrid Cloud to Reduce the Time to Value for Big Data Analytics by 90 Percent

Cloudera

ATB Financial is Alberta’s largest home grown financial institution, and prides itself on its customer obsession, putting the over 750,000 Albertans at the centre of all that they do. As a result, ATB is constantly transforming in order to ensure it can continue to deliver unparalleled value to Albertans. A key pillar in the transformation journey is focused on robust data operations that can help ATB deliver timely, relevant and delightful service.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

It's the Relationship - Not Just the Data - That is Critical to Success

Teradata

Rob Armstrong explains that while data is important, the real key is preserving the relationships across the data models that leads to insight and successful business outcomes.

IT 40
article thumbnail

Machine Learning In The Enterprise

Data Engineering Podcast

Summary Machine learning is a class of technologies that promise to revolutionize business. Unfortunately, it can be difficult to identify and execute on ways that it can be used in large companies. Kevin Dewalt founded Prolego to help Fortune 500 companies build, launch, and maintain their first machine learning projects so that they can remain competitive in our landscape of constant change.

article thumbnail

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

Building a scalable, reliable and performant machine learning (ML) infrastructure is not easy. It takes much more effort than just building an analytic model with Python and your favorite machine learning framework. After all, machine learning with Python requires the use of algorithms that allow computer programs to constantly learn, but building that infrastructure is several levels higher in complexity.

article thumbnail

Extending Vector with eBPF to inspect host and container performance

Netflix Tech

by Jason Koch , with Martin Spier , Brendan Gregg , Ed Hunter Improving the tools available to our engineers to help them diagnose, triage, and work through software performance challenges in the cloud is a key goal for the cloud performance engineering team at Netflix. Today we are excited to announce latency heatmaps and improved container support for our on-host monitoring solution?

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

How to Build a Facebook Messenger Chatbot Powered by Fast SQL on CSV

Rockset

A chatbot, like any human customer service rep, needs data about your business and products in order to respond to customers with the correct information. What is an efficient way to hook up your data to a chat application without significant data engineering? In this blog, I will demonstrate how you can build a Facebook Messenger chatbot to help users find vacation rentals using CSV data on Airbnb rentals.

SQL 40
article thumbnail

Cloudera announces support for Azure’s next-generation Data Lake Store

Cloudera

Today we are proud to announce our support for ADLS Gen2 as it enters general availability on Microsoft Azure. CDH 6.1 already includes support for MapReduce and Spark jobs, Hive and Impala queries, and Oozie workflows on ADLS Gen2. The Cloudera platform delivers a one-stop shop that allows you to store any kind of data, process and analyze it in many different ways in a single environment, and integrate with the rest of your data infrastructure.

article thumbnail

How to Make Space for Research & Innovation?

Zalando Engineering

Redesigning research and product development so that the explorative nature of data science becomes a driver for innovation Zalando leverages cutting edge machine learning technologies to be Europe’s leading online platform for fashion and lifestyle. In order to develop these products, data scientists and product roles have to work together closely.

article thumbnail

Cleaning And Curating Open Data For Archaeology

Data Engineering Podcast

Summary Archaeologists collect and create a variety of data as part of their research and exploration. Open Context is a platform for cleaning, curating, and sharing this data. In this episode Eric Kansa describes how they process, clean, and normalize the data that they host, the challenges that they face with scaling ETL processes which require domain specific knowledge, and how the information contained in connections that they expose is being used for interesting projects.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Journey to Event Driven – Part 2: Programming Models for the Event-Driven Architecture

Confluent

Part 1 of this series discussed why you need to embrace event-first thinking, while this article builds a rationale for different styles of event-driven architectures and compares and contrasts scaling, persistence and runtime models. Once settled on the event streaming approach, I’ll provide a high-level dataflow of how we design systems for payment processing at scale using this approach.

article thumbnail

Engineering to Improve Marketing Effectiveness (Part 3)?—?Scaling Paid Media campaigns

Netflix Tech

Engineering to Improve Marketing Effectiveness (Part 3)?—?Scaling Paid Media campaigns This is the third blog of the series on Marketing Technology at Netflix. This blog focuses on the marketing tech systems that are responsible for campaign setup and delivery of our paid media campaigns. The first blog focused on solving for creative development and localization at scale.

Media 55
article thumbnail

Using Smart Schema to Accelerate Insights from Nested JSON

Rockset

Developers often need to work with datasets without a fixed schema, like heavily nested JSON data with several deeply nested arrays and objects, mixed data types, null values, and missing fields. In addition, the shape of the data is prone to change when continuously syncing new data. Understanding the shape of a dataset is crucial to constructing complex queries for building applications or performing data science investigations.

article thumbnail

Governing for digital transformation and growth

Cloudera

Ask a CIO where their focus lies and ‘digital transformation’ as well as ‘growth’ will come into the conversation quite quickly. The former sees growing investment in data analytics to become data-driven (45% of organizations expect to increase their spending in this area) while the latter is fueled by disruptive technology and the adoption of AI (41% of organizations name it as their game changer).

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

A Journey On End To End Testing A Microservices Architecture

Zalando Engineering

End to end testing is a testing technique used to test the flow of an application through a business transaction. In microservices architecture there are different components working together to enable a business capability, therefore testing all of them can get tricky. In this article you can read about our team’s journey: What our system looks like What do you get from e2e testing?

article thumbnail

Is There Such a Thing as Too Much Parallelism?

Teradata

In her blog, Carrie Ballinger discusses parallelism and how you can fashion it to specific needs by using the new sparse map capability

IT 45
article thumbnail

Kafka Connect Deep Dive – JDBC Source Connector

Confluent

One of the most common integrations that people want to do with Apache Kafka ® is getting data in from a database. That is because relational databases are a rich source of events. The existing data in a database, and any changes to that data, can be streamed into a Kafka topic. From there these events can be used to drive applications, be streamed to other data stores such as search replicas or caches and streamed to storage for analytics.

Kafka 90
article thumbnail

What Is Readable Code?

Pandora Engineering

Code creates interfaces. But code itself is also an interface.

Coding 52
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Distributed Aggregation Queries - A Rockset Intern Story

Rockset

I first met with the Rockset team when they were just four people in a small office in San Francisco. I was taken aback by their experience and friendliness, but most importantly, their willingness to spend a lot of time mentoring me. I knew very little about Rockset's technologies and didn’t know what to expect from such an agile early-stage startup, but decided to join the team for a summer internship anyway.

Food 40
article thumbnail

Cloudera’s and Hortonworks’ data platform in the cloud named among Leaders in new Forrester Wave

Cloudera

When Cloudera was formed about 10 years ago, the founders believed that companies would jump at the chance to store, manage, and analyze their data in the cloud. Thus, they came up with the name Cloudera, which was a play on “era of cloud.” But, much to their surprise, companies weren’t ready for cloud; they were more focused with on-prem. So, Cloudera focused on helping companies with storing, managing, and analyzing data on-prem.

Cloud 56
article thumbnail

Typescript Best Practices

Zalando Engineering

Typescript is becoming more and more popular. As with everything, there are good and bad sides. How good it is depends on your usage on your application. This article will not discuss the good and bad sides of Typescript but some best practices, which will help for some cases to get the best out of Typescript. 1. Strict configuration Strict configuration should be mandatory and enabled by default, as there is not much value using Typescript without these settings.

Coding 40
article thumbnail

What Lessons Can Apollo 13 Teach Us About Analytics?

Teradata

Tom Casey explains lessons from the Apollo 13 program and how they can be applied to day to day dealings in the analytics world.

article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.