December, 2018

article thumbnail

Simplifying Continuous Data Processing Using Stream Native Storage In Pravega with Tom Kaitchuck - Episode 63

Data Engineering Podcast

Summary As more companies and organizations are working to gain a real-time view of their business, they are increasingly turning to stream processing technologies to fullfill that need. However, the storage requirements for continuous, unbounded streams of data are markedly different than that of batch oriented workloads. To address this shortcoming the team at Dell EMC has created the open source Pravega project.

article thumbnail

Our learnings from adopting GraphQL

Netflix Tech

A Marketing Tech Campaign by Artem Shtatnov and Ravi Srinivas Ranganathan In an earlier blog post , we provided a high-level overview of some of the applications in the Marketing Technology team that we build to enable scale and intelligence in driving our global advertising, which reaches users on sites like The New York Times, Youtube, and thousands of others.

Coding 111
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Billion Data Point Challenge: Building a Query Engine for High Cardinality Time Series Data

Uber Engineering

Uber, like most large technology companies, relies extensively on metrics to effectively monitor its entire stack. From low-level system metrics, such as memory utilization of a host, to high-level business metrics, including the number of Uber Eats orders in a … The post The Billion Data Point Challenge: Building a Query Engine for High Cardinality Time Series Data appeared first on Uber Engineering Blog.

article thumbnail

Creating Multi-language NLP Pipelines with Apache Spark

Domino Data Lab: Data Engineering

In this guest post, Holden Karau , Apache Spark Committer , provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting spaCy into Java. She has already written a complementary blog post on using spaCy to process text data for Domino. Karau is a Developer Advocate at Google as well as a co-author on High Performance Spark and Learning Spark.

Java 52
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Live Dashboards on Streaming Data - A Tutorial Using Amazon Kinesis and Rockset

Rockset

We live in a world where diverse systems—social networks, monitoring, stock exchanges, websites, IoT devices—all continuously generate volumes of data in the form of events, captured in systems like Apache Kafka and Amazon Kinesis. One can perform a wide variety of analyses, like aggregations, filtering, or sampling, on these event streams, either at the record level or over sliding time windows.

AWS 52
article thumbnail

One Audio Sequencer to Rule Them All

Pandora Engineering

Photo credit: Carol Yepes Last month Pandora announced a public podcast beta in conjunction with the Podcast Genome Project. This rollout introduced many exciting features to our current mobile application offerings, including fully integrated and native podcast support. Ironically, one of the most interesting features and perhaps our biggest engineering win with this iteration is something that’s transparent to our end users: the inclusion of a new audio playback sequencer used exclusively for

Media 52

More Trending

article thumbnail

Netflix OSS and Spring Boot?—?Coming Full Circle

Netflix Tech

Netflix OSS and Spring Boot?—?Coming Full Circle Taylor Wicksell, Tom Cellucci, Howard Yuan, Asi Bross, Noel Yap, and David Liu In 2007, Netflix started on a long road towards fully operating in the cloud. Much of Netflix’s backend and mid-tier applications are built using Java, and as part of this effort Netflix engineering built several cloud infrastructure libraries and systems?

Java 111
article thumbnail

Open Source: November Review - Maintainer training, new releases and more

Zalando Engineering

Project Highlights ExternalDNS version 0.5.9 is ready for testing. This project allows you to control DNS records dynamically via Kubernetes resources in a DNS provider-agnostic way. ExternalDNS also successfully made its way to the Kubernetes Incubator. Check out the list of changes in this new release. Zalando-Incubator welcomed two brand new open source projects 1) Darty - a data dependency manager for data science projects.

article thumbnail

Data Science vs Engineering: Tension Points

Domino Data Lab: Data Engineering

This blog post provides highlights and a full written transcript from the panel, “ Data Science Versus Engineering: Does It Really Have To Be This Way? ” with Amy Heineike , Paco Nathan , and Pete Warden at Domino HQ. Topics discussed include the current state of collaboration around building and deploying models, tension points that potentially arise, as well as practical advice on how to address these tension points.

article thumbnail

Announcing my session at #SQLBits - Azure Databricks

Advancing Analytics: Data Engineering

Simon Whiteley and I will be back at #SQLBits 2019 talking about hashtag#DataEngineering and #DataScience in Databricks. We will look at #ApacheSpark #Python #Engineering & #MachineLearning in this full day training day. Register Now Have you looked at Azure DataBricks yet? No! Then you need to. Why you ask, there are many reasons. The number 1, knowing how to use Apache Spark will earn you more money.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Three Trends for Modernizing Analytics and Data Warehousing in 2019

Cloudera

Data analytics priorities have shifted this year. Growth factors and business priority are ever changing. Don’t blink or you might miss what leading organizations are doing to modernize their analytic and data warehousing environments. Business intelligence (BI), an umbrella term coined in 1989 by Howard Dresner, Chief Research Officer at Dresner Advisory Services, refers to the ability of end-users to access and analyze enterprise data.

BI 52
article thumbnail

Advice On Scaling Your Data Pipeline Alongside Your Business with Christian Heinzmann - Episode 61

Data Engineering Podcast

Summary Every business needs a pipeline for their critical data, even if it is just pasting into a spreadsheet. As the organization grows and gains more customers, the requirements for that pipeline will change. In this episode Christian Heinzmann, Head of Data Warehousing at Grubhub, discusses the various requirements for data pipelines and how the overall system architecture evolves as more data is being processed.

article thumbnail

Implementing the Netflix Media Database

Netflix Tech

In the previous blog posts in this series, we introduced the N etflix M edia D ata B ase ( NMDB ) and its salient “Media Document” data model. In this post we will provide details of the NMDB system architecture beginning with the system requirements?—?these will serve as the necessary motivation for the architectural choices we made. A fundamental requirement for any lasting data system is that it should scale along with the growth of the business applications it wishes to serve.

Media 96
article thumbnail

Cloud Nine: All Your Analytics, Wherever You Want Them. Really!

Teradata

Brian Wood explains how Teradata Vantage in the cloud has your back when it comes to analytic simplicity, control, effectiveness, and results.

Cloud 60
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Running SQL on Nested JSON

Rockset

When we surveyed the market, we saw the need for a solution that could perform fast SQL queries on fluid JSON data , including arrays and nested objects: Best architecture to convert JSON to SQL? What are the ways to run SQL on JSON data without predefining schemas? I need database to take JSON and execute SQL. What are my options? The Challenge of SQL on JSON Some form of ETL to transform JSON to tables in SQL databases may be workable for basic JSON data with fixed fields that are known up fro

SQL 40
article thumbnail

Front-End Micro Services

Zalando Engineering

The “micro frontends” idea has been around for a while now, with great resources such as this Tom Söderlund article , which includes a list of current existing implementations. In this article, I would like to take an in-depth look at the reference implementation using fragments: explain what it tries to achieve, where it falls short and possible solutions to those limitations.

article thumbnail

IoT: Make Mine a Standard

Cloudera

Standards and open source are closely linked. Open source allows you to stay on the cutting edge, to have the latest and most innovative technologies at your disposal at all times. No one company is going to outpace the rate at which an open source community produces innovative new software. In spirit and by definition, open source excludes all things proprietary.

article thumbnail

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

Data Engineering Podcast

Summary Apache Spark is a popular and widely used tool for a variety of data oriented projects. With the large array of capabilities, and the complexity of the underlying system, it can be difficult to understand how to get started using it. Jean George Perrin has been so impressed by the versatility of Spark that he is writing a book for data engineers to hit the ground running.

Scala 100
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Modernizing the Web Playback UI

Netflix Tech

by Corey Grunewald & Matt Jaquish Since 2013, the user experience of playing videos on the Netflix website has changed very little. During this period, teams at Netflix have rolled out amazing video playback features , but the visual design and user controls of the playback UI have remained the same. The visual design and user controls of playback have been the same since 2013.

article thumbnail

Ensuring Actionable Answers from Analytic Models

Teradata

Monica Woolmer provides insight into determining the right analytic models to provide practical, actionable answers for business.

45
article thumbnail

Apache Zookeeper As A Building Block For Distributed Systems with Patrick Hunt - Episode 59

Data Engineering Podcast

Summary Distributed systems are complex to build and operate, and there are certain primitives that are common to a majority of them. Rather then re-implement the same capabilities every time, many projects build on top of Apache Zookeeper. In this episode Patrick Hunt explains how the Apache Zookeeper project was started, how it functions, and how it is used as a building block for other distributed systems.

Systems 100
article thumbnail

Performance comparison of video coding standards: an adaptive streaming perspective

Netflix Tech

by Joel Sole, Liwei Guo, Andrey Norkin, Mariana Afonso, Kyle Swanson, Anne Aaron “This is my advice to people: Learn how to cook, try new recipes, learn from your mistakes, be fearless, and above all have fun”? —?Julia Child (American chef, author, and television personality) At Netflix, we are continually refining the recipes we use to serve your favorite shows and movies at the best possible quality.

Coding 81
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Cache warming: Agility for a stateful service

Netflix Tech

by Deva Jayaraman , Shashi Madappa , Sridhar Enugula , and Ioannis Papapanagiotou EVCache has been a fundamental part of the Netflix platform (we call it Tier-1), holding Petabytes of data. Our caching layer serves multiple use cases from signup, personalization, searching, playback, and more. It is comprised of thousands of nodes in production and hundreds of clusters all of which must routinely scale up due to the increasing growth of our members.

AWS 52
article thumbnail

Q&A with Greg Rahn – The changing Data Warehouse market

Cloudera

Hi Greg, thank you for joining us today. I would like to start off by asking you to tell us about your background and what kicked off your 20-year career in relational database technology? Greg Rahn: I first got introduced to SQL relational database systems while I was in undergrad. I was a student system administrator for the campus computing group and at that time they were migrating the campus phone book to a new tool, new to me, known as Oracle.

article thumbnail

Cloudera and PUE Partner to Address Big Data Talent Shortage in Spain

Cloudera

Data may be the world’s most valuable resource , but the global big data talent shortage can hinder the ability of organizations to capitalize on that potential. Talent will be the key factor in linking innovation, competitiveness, and growth in the 21st century. Governments around the globe, grappling with high rates of unemployment, are eying programs to deliver big data skills training and certification to citizens that address both problematic unemployment and entice organizations to maintai