April, 2019

article thumbnail

Running Your Database On Kubernetes With KubeDB

Data Engineering Podcast

Summary Kubernetes is a driving force in the renaissance around deploying and running applications. However, managing the database layer is still a separate concern. The KubeDB project was created as a way of providing a simple mechanism for running your storage system in the same platform as your application. In this episode Tamal Saha explains how the KubeDB project got started, why you might want to run your database with Kubernetes, and how to get started.

Database 100
article thumbnail

Python at Netflix

Netflix Tech

By Pythonistas at Netflix, coordinated by Amjith Ramanujam and edited by Ellen Livengood As many of us prepare to go to PyCon, we wanted to share a sampling of how Python is used at Netflix. We use Python through the full content lifecycle, from deciding which content to fund all the way to operating the CDN that serves the final video to 148 million members.

Python 112
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

From Apache Kafka to Amazon S3: Exactly Once

Confluent

At Confluent, we see many of our customers are on AWS, and we’ve noticed that Amazon S3 plays a particularly significant role in AWS-based architectures. Unless a use case actively requires a specific database, companies use S3 for storage and process the data with Amazon Elastic MapReduce (EMR) or Amazon Athena. But even if a use case requires a specific database such as Amazon Redshift, data will still land to S3 first and only then load to Redshift.

Kafka 110
article thumbnail

3 Ways New As-a-Service Offerings Bring Choice and Flexibility to Teradata Vantage

Teradata

At Teradata, we think a lot about our customers in the cloud, and continue on our promise to deliver choice and flexibility by adding new as-a-service options for Teradata Vantage.

Cloud 83
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Breaking Down Data Silos in Financial Services with a Centralized Data Management Platform

Cloudera

Organizations in the financial services industry rely on data to make strategic decisions, drive their businesses, and maintain a competitive edge. The Bank of England was discovering that legacy tools were no longer sufficient to satisfy the growing demands of analysts and economists. The Bank of England is the central bank of the United Kingdom formed in 1694.

article thumbnail

Analytics on DynamoDB: Comparing Elasticsearch, Athena and Spark

Rockset

In this blog post I compare options for real-time analytics on DynamoDB - Elasticsearch , Athena, and Spark - in terms of ease of setup, maintenance, query capability, latency. There is limited support for SQL analytics with some of these options. I also evaluate which use cases each of them are best suited for. Developers often have a need to serve fast analytical queries over data in Amazon DynamoDB.

NoSQL 52

More Trending

article thumbnail

Introducing SVT-AV1: a scalable open-source AV1 framework

Netflix Tech

by Andrey Norkin, Joel Sole, Kyle Swanson, Mariana Afonso, Anush Moorthy, Anne Aaron Netflix Headquarters, Winchester Circle. Netflix headquarters circa 2014. It’s a nice building with good architecture! This was the primary home of Netflix for a number of years during the company’s growth, but at some point Netflix had outgrown its home and needed more space.

Coding 66
article thumbnail

Putting Events in Their Place with Dynamic Routing

Confluent

Event-driven architecture means just that: It’s all about the events. In a microservices architecture, events drive microservice actions. No event, no shoes, no service. In the most basic scenario, microservices that need to take action on a common stream of events all listen to that stream. In the Apache Kafka ® world, this means that each of those microservice client applications subscribes to a common Kafka topic.

Kafka 108
article thumbnail

Why Smart Cities Need Intelligent Data

Teradata

In his blog, Bob McQueen defines smart cities, their challenges and opportunities, and the use of smart data management.

Data 86
article thumbnail

Open Source: March Updates - A new Kubernetes operator & more Cloud Native Apps

Zalando Engineering

Project Highlights A new operator is added to Zalando’s list of Cloud Native Applications. Elasticsearch Operator - an operator for running Elasticsearch in Kubernetes with focus on operational aspects, like safe draining and offering auto-scaling capabilities for Elasticsearch data nodes, rather than just abstracting manifest definitions. To make things even simpler for developers, we also released a new framework that helps to build Kubernetes operators in Python.

Cloud 52
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Machine Learning in Production: Software Architecture

Domino Data Lab: Data Engineering

Special thanks to Addison-Wesley Professional for permission to excerpt the following "Software Architecture" chapter from the book, Machine Learning in Production. This chapter excerpt provides data scientists with insights and tradeoffs to consider when moving machine learning models to production. Also, if you’re interested in learning about how Domino provides an API endpoint for your model, check out this video tutorial on the Domino Support site.

article thumbnail

Index Your Big Data With Pilosa For Faster Analytics

Data Engineering Podcast

Summary Database indexes are critical to ensure fast lookups of your data, but they are inherently tied to the database engine. Pilosa is rewriting that equation by providing a flexible, scalable, performant engine for building an index of your data to enable high-speed aggregate analysis. In this episode Seebs explains how Pilosa fits in the broader data landscape, how it is architected, and how you can start using it for your own analysis.

Big Data 100
article thumbnail

Why adopt a hybrid, multi-cloud strategy?

Cloudera

Enterprises are moving to the cloud. In 2016, 60.9% of application workloads were still on-premises in enterprise data centers; by the end of 2017, less than half (47.2%) were on-premises. Enterprises plan to implement new apps primarily in the cloud while migrating 20.7% of existing apps to public cloud. Despite this trend to move to cloud, It will be rare for enterprises to deploy 100% of their apps in the cloud, let alone deploy all apps to a single cloud.

Cloud 54
article thumbnail

12 Programming Languages Walk into a Kafka Cluster…

Confluent

When it was first created, Apache Kafka ® had a client API for just Scala and Java. Since then, the Kafka client API has been developed for many other programming languages which enables you to pick the language you want. This freedom of choice ultimately allows you to build an event streaming platform with the language best suited to your business needs.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

How to Analyze Data at Speed and Scale Using Pervasive Data Intelligence

Teradata

Chris Twogood explains while large companies who utilize data need Pervasive Data Intelligence in order to leverage all of their data, all of the time.

article thumbnail

How to set an ideal thread pool size

Zalando Engineering

We all know that thread creation in Java is not free. The actual overhead varies across platforms, but thread creation takes time, introducing latency into request processing, and requires some processing activity by the JVM and OS. This is where the Thread Pool comes to the rescue. The thread pool reuses previously created threads to execute current tasks and offers a solution to the problem of thread cycle overhead and resource thrashing.

Java 42
article thumbnail

How We Structure our dbt Projects

dbt Developer Hub

As the maintainers of dbt, and analytics consultants, at Fishtown Analytics (now dbt Labs) we build a lot of dbt projects. Over time, we’ve developed internal conventions on how we structure them. This article does not seek to instruct you on how to design a final model for your stakeholders — it won’t cover whether you should denormalize everything into one wide master table , or have many tables that need to be joined together in the BI layer.

Project 40
article thumbnail

Serverless Data Pipelines On DataCoral

Data Engineering Podcast

Summary How much time do you spend maintaining your data pipeline? How much end user value does that provide? Raghu Murthy founded DataCoral as a way to abstract the low level details of ETL so that you can focus on the actual problem that you are trying to solve. In this episode he explains his motivation for building the DataCoral platform, how it is leveraging serverless computing, the challenges of delivering software as a service to customer environments, and the architecture that he has de

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

What customer centric corporate culture really means and why it is so important

Cloudera

All organizations, big or small, have a unique corporate culture that has been nurtured and mastered over the years. A company’s culture is its basic personality and the essence of how employees interact and work. It is the sum of company beliefs, ethics, expectations, goals, value and mission. The company culture is normally where brand promises are either kept or broken.

article thumbnail

Kafka Summit New York 2019 Session Videos

Confluent

It seems like there’s a Kafka Summit every other month. Of course there’s not—it’s every fourth month—but hey, close enough. We now have the Kafka Summit New York in the books, and the session videos are available in record time. As I usually do, let me break the event down for you. We planned the New York event to be a bit smaller than last fall’s flagship San Francisco Summit.

Kafka 102
article thumbnail

How U.S. Bank Uses A.I. and Machine Learning to Deeply Personalize Your Banking Experience

Teradata

Katherine Knowles-Marchione explains how US. Bank is using AI to improve and personalize the banking experience.

Banking 85
article thumbnail

Secondary Indexes For Analytics On DynamoDB

Rockset

In this post I explore how to support analytical queries without encountering prohibitive scan costs, by leveraging secondary indexes in DynamoDB. I also evaluate the pros and cons of this approach in contrast to extracting data to another system like Athena, Spark or Elastic. Rockset recently added support for DynamoDB - which basically means you can run fast SQL on DynamoDB tables without any ETL.

NoSQL 40
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

End-to-end load testing Zalando’s production website

Zalando Engineering

Black Friday is the busiest day of the year for us, with over 4,200 orders per minute during the event in 2018. We need to make sure we’re technically able to handle the huge influx of customers. As a part of our preparations we ask all of our teams to perform load tests to ensure their individual components will handle the expected load. In addition, and due to the distributed nature of our system's architecture , we also need to ensure it will handle the expected load once all components have

Python 40
article thumbnail

Announcing Confluent Cloud for Apache Kafka as a Native Service on Google Cloud Platform

Confluent

I’m excited to announce that we’re partnering with Google Cloud to make Confluent Cloud, our fully managed offering of Apache Kafka ® , available as a native offering on Google Cloud Platform (GCP). This means you will have the ability to use Confluent Cloud’s managed Apache Kafka service with familiar Google tools and processes, including integration into the Google Cloud Console and GCP Marketplace to provide a seamless sign-up experience, and integrated billing and first-line support provided

article thumbnail

KSQL: What’s New in 5.2

Confluent

KSQL enables you to write streaming applications expressed purely in SQL. There’s a ton of great new features in 5.2, many of which are a result of requests and support from the community—we use GitHub to track these, and I’ve indicated in each point below the corresponding issue. If you have suggestions for new features, please do be sure to search our GitHub issues page and upvote, or create a new issue as appropriate.

Food 96
article thumbnail

Optimizing Kafka Streams Applications

Confluent

With the release of Apache Kafka ® 2.1.0, Kafka Streams introduced the processor topology optimization framework at the Kafka Streams DSL layer. This framework opens the door for various optimization techniques from the existing data stream management system (DSMS) and data stream processing literature. In what follows, we provide some context around how a processor topology was generated inside Kafka Streams before 2.1, with a focus on stateful operations like aggregations and joins.

Kafka 91
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Monitoring Data Replication in Multi-Datacenter Apache Kafka Deployments

Confluent

Enterprises run modern data systems and services across multiple cloud providers, private clouds and on-prem multi-datacenter deployments. Instead of having many point-to-point connections between sites, the Confluent Platform provides an integrated event streaming architecture with frictionless data replication between sites. Applications can publish streams of data to a self-hosted on-prem cluster, replicate them to another on-prem cluster or to different cloud providers, load them into data s

Kafka 87
article thumbnail

Reshaping Entire Industries with IoT and Confluent Cloud

Confluent

While the current hype around the Internet of Things (IoT) focuses on smart “things”—smart homes, smart cars, smart watches—the first known IoT device was a simple Coca-Cola vending machine at Carnegie Mellon University in Pittsburgh. Students in the 1980s, tired of long walks to an empty machine, installed a board that tracked the machine’s sensors to determine whether the machine was stocked and the bottles were cold.

Food 85
article thumbnail

Creating an IoT-Based, Data-Driven Food Value Chain with Confluent Cloud

Confluent

Industries are oftentimes more complex than we think. For example, the dinner you order at a restaurant or the ingredients you buy (or have delivered) to cook dinner at home encounter a number of parties along the supply chain before making it to your table or kitchen. Think about it. You have farmers. You have transportation companies. You have food processors.

Food 80
article thumbnail

Dawn of DevOps: Managing and Evolving Schemas with Confluent Control Center

Confluent

As we announced in Introducing Confluent Platform 5.2 , the latest release introduces many new features that enable you to build contextual event-driven applications. In particular, the management and monitoring capabilities that we added to Confluent Control Center have evolved it into an indispensable tool for anyone working with Apache Kafka ®. With the Developer License , all of the Confluent Platform features are free of charge for an indefinite duration in a non-production environment with

article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.