June, 2019

article thumbnail

The Workflow Engine For Data Engineers And Data Scientists

Data Engineering Podcast

Summary Building a data platform that works equally well for data engineering and data science is a task that requires familiarity with the needs of both roles. Data engineering platforms have a strong focus on stateful execution and tasks that are strictly ordered based on dependency graphs. Data science platforms provide an environment that is conducive to rapid experimentation and iteration, with data flowing directly between stages.

article thumbnail

What’s New in Apache Kafka 2.3

Confluent

It’s official: Apache Kafka ® 2.3 has been released! Here is a selection of some of the most interesting and important features we added in the new release. Core Kafka. KIP-351 and KIP-427: Improved monitoring for partitions which have lost replicas. In order to keep your data safe, Kafka creates several replicas of it on different brokers. Kafka will not allow writes to proceed unless the partition has a minimum number of in-sync replicas.

Kafka 111
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Predictive CPU isolation of containers at Netflix

Netflix Tech

By Benoit Rostykus, Gabriel Hartmann Noisy Neighbors We’ve all had noisy neighbors at one point in our life. Whether it’s at a cafe or through a wall of an apartment, it is always disruptive. The need for good manners in shared spaces turns out to be important not just for people, but for your Docker containers too. When you’re running in the cloud your containers are in a shared space; in particular they share the CPU’s memory hierarchy of the host instance.

article thumbnail

What Working “at Scale” Really Means

Teradata

Rob Armstrong discusses the challenges of moving from a departmental solution to operational and production systems working at scale, and how Teradata Vantage can solve for them.

Systems 87
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Cloudera Provides First Look at Cloudera Data Platform, the Industry’s First Enterprise Data Cloud

Cloudera

Cloudera Unveils Industry’s First Enterprise Data Cloud in Webinar. How do you take a mission-critical on-premises workload and rapidly burst it to the cloud? Can you instantly auto-scale resources as demand requires and just as easily pause your work so you don’t run up your cloud bill? On June 18th, Cloudera provided an exclusive preview of these capabilities, and more, with the introduction of Cloudera Data Platform (CDP), the industry’s first enterprise data cloud.

Cloud 87
article thumbnail

Should you have an ETL window in your Modern Data Warehouse?

Advancing Analytics: Data Engineering

Ah the ETL (Extract-Transform-Load) Window, the schedule by which the Business Intelligence developer sets their clock, the nail-biting nightly period during which the on-call support hopes their phone won’t ring. It’s a cornerstone of the data warehousing approach… and we shouldn’t have one. There, I said it. Hear me out – back in the on-premises days we had data loading processes that connect directly to our source system databases and perform huge data extract queries as the start of one long

More Trending

article thumbnail

Microservices, Apache Kafka, and Domain-Driven Design

Confluent

Microservices have a symbiotic relationship with domain-driven design (DDD)—a design approach where the business domain is carefully modeled in software and evolved over time, independently of the plumbing that makes the system work. I see this pattern coming up more and more in the field in conjunction with Apache Kafka ®. In these projects, microservice architectures use Kafka as an event streaming platform.

Kafka 109
article thumbnail

Building a SQL Development Environment for Messy, Semi-Structured Data

Rockset

Why build a new SQL development environment? We love SQL — our mission is to bring fast, real-time queries to messy, semi-structured real-world data and SQL is a core part of our effort. A SQL API allows our product to fit neatly into the stacks of our users without any workflow re-architecting. Our users can easily integrate Rockset with a multitude of existing tools for SQL development (e.g.

SQL 52
article thumbnail

Why Hadoop Failed and Where We Go from Here

Teradata

Chad Meley delves into the demise of Hadoop distribution vendors and how they got there.

Hadoop 110
article thumbnail

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

Noisy Neighbors in Large, Multi-Tenant Clusters. The typical Cloudera Enterprise Data Hub Cluster starts with a few dozen nodes in the customer’s datacenter hosting a variety of distributed services. Once configured and secured, the cluster administrator (admin) gives access to a few individuals to onboard their workloads. Over time, workloads start processing more data, tenants start onboarding more workloads, and administrators (admins) start onboarding more tenants.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Unlock the Value of Data Faster Through Modern Data Warehousing

Advancing Analytics: Data Engineering

Data has value – I think we’ve finally got to the point where most people agree on this. The problem we face is how long it takes to unlock that value, and it’s a frustration that most businesses I speak to are having. Let’s think about why this is. After the horror that was the “data silo” days, with clumps of data living in Access databases, Excel spreadsheets and isolated data stores, we’ve had a pretty good run with the classic Kimball data warehouse.

article thumbnail

Managing The Machine Learning Lifecycle

Data Engineering Podcast

Summary Building a machine learning model can be difficult, but that is only half of the battle. Having a perfect model is only useful if you are able to get it into production. In this episode Stepan Pushkarev, founder of Hydrosphere, explains why deploying and maintaining machine learning projects in production is different from regular software projects and the challenges that they bring.

article thumbnail

Designing the.NET API for Apache Kafka

Confluent

Confluent’s clients for Apache Kafka ® recently passed a major milestone—the release of version 1.0. This has been a long time in the making. Magnus Edenhill first started developing librdkafka about seven years ago, later joining Confluent in the very early days to help foster the community of Kafka users outside the Java ecosystem. Since then, the clients team has been on a mission to build a set of high-quality librdkafka bindings for different languages (initially Python , Go , and.NET

Kafka 106
article thumbnail

How We Use RocksDB at Rockset

Rockset

In this blog post, I'll describe how we use RocksDB at Rockset and how we tuned it to get the most performance out of it. I assume that the reader is generally familiar with how Log-Structured Merge tree based storage engines like RocksDB work. At Rockset, we want our users to be able to continuously ingest their data into Rockset with sub-second write latency and query it in 10s of milliseconds.

Bytes 40
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

How Teradata and Oxford Saïd are Modernizing Analytics for Academic Research

Teradata

Oxford and Teradata partner to modernize analytics for academic research, shape new bodies of research and find answers to pressing business challenges.

80
article thumbnail

Netflix Studio Hack Day?—?May 2019

Netflix Tech

Netflix Studio Hack Day ?—?May 2019 By Tom Richards , Carenina Garcia Motion , and Marlee Tart Hack Days are a big deal at Netflix. They’re a chance to bring together employees from all our different disciplines to explore new ideas and experiment with emerging technologies. For the most recent hack day, we channeled our creative energy towards our studio efforts.

Java 15
article thumbnail

Modern Data Warehousing with Azure Databricks at the #PASSSummit in Seattle

Advancing Analytics: Data Engineering

Hey everyone, Advancing Analytics are heading to Seattle in November for the PASS Summit. We will be delivering a full day training day on Azure Databricks - Practical Azure Databricks: Engineering & Warehousing at Scale. The session will focus on using Azure Databricks for Modern Data Warehousing. Not sure if the day is for you? Well take a look at the video we recorded.

article thumbnail

Evolving An ETL Pipeline For Better Productivity

Data Engineering Podcast

Summary Building an ETL pipeline can be a significant undertaking, and sometimes it needs to be rebuilt when a better option becomes available. In this episode Aaron Gibralter, director of engineering at Greenhouse, joins Raghu Murthy, founder and CEO of DataCoral, to discuss the journey that he and his team took from an in-house ETL pipeline built out of open source components onto a paid service.

Media 100
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Streaming Data from the Universe with Apache Kafka

Confluent

You might think that data collection in astronomy consists of a lone astronomer pointing a telescope at a single object in a static sky. While that may be true in some cases (I collected the data for my Ph.D. thesis this way), the field of astronomy is rapidly changing into a data-intensive science with real-time needs. Each night, large-scale astronomical telescope surveys detect millions of changing objects in the sky and need to stream results to scientists for time-sensitive, complementary f

Kafka 102
article thumbnail

IValue: efficient representation of dynamic types in C++

Rockset

Introduction In traditional SQL systems, a column's type is determined when the table is created, and never changes while executing a query. If you create a table with an integer-valued column, the values in that column will always be integers (or possibly NULL ). Rockset, however, is dynamically typed , which means that we often don't know the type of a value until we actually execute the query.

Bytes 40
article thumbnail

Why Vantage Is Our Most Popular Release Ever

Teradata

Teradata Vantage is busting through analytic silos and raising the bar. Find out what drove these innovations and led to Vantage becoming our most popular release yet.

75
article thumbnail

Building a Scalable Search Architecture

Confluent

Software projects of all sizes and complexities have a common challenge: building a scalable solution for search. Who has never seen an application use RDBMS SQL statements to run searches? You might be wondering, is this a good solution? As the databases professor at my university used to say, it depends. Using SQL to run your search might be enough for your use case, but as your project requirements grow and more advanced features are needed—for example, enabling synonyms, multilingual search,

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

How to Connect KSQL to Confluent Cloud using Kubernetes with Helm

Confluent

Confluent Cloud, a fully managed event cloud-native streaming service that extends the value of Apache Kafka ® , is simple, resilient, secure, and performant, allowing you to focus on what is important—building contextual event-driven applications, not infrastructure. If you are using Confluent Cloud as your managed Apache Kafka cluster, you probably also want to start using other Confluent Platform components like the Confluent Schema Registry, Kafka Connect, KSQL, and Confluent REST Proxy.

Cloud 94
article thumbnail

Reliable, Fast Access to On-Chain Data Insights

Confluent

At TokenAnalyst , we are building the core infrastructure to integrate, clean, and analyze blockchain data. Data on a blockchain is also known as on-chain data. We offer both historical and low-latency data streams of on-chain data across multiple blockchains. How we use Apache Kafka and the Confluent Platform. Apache Kafka ® is the central data hub of our company.

article thumbnail

Spring for Apache Kafka Deep Dive – Part 4: Continuous Delivery of Event Streaming Pipelines

Confluent

For event streaming application developers, it is important to continuously update the streaming pipeline based on the need for changes in the individual applications in the pipeline. It is also important to understand some of the common streaming topologies that streaming developers use to build an event streaming pipeline. Here in part 4 of the Spring for Apache Kafka Deep Dive blog series, we will cover: Common event streaming topology patterns supported in Spring Cloud Data Flow.

Kafka 86
article thumbnail

Four Reasons Why Upgrading to Vantage is Worth It

Teradata

Running older Teradata analytics software versions may not support the latest innovations of Vantage and could cost you more than upgrading. Learn more.

IT 75
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Swedbank Delivers Superior Customer Experience by Illuminating the Customer Journey

Teradata

Find out how Swedbank has partnered with Teradata to illuminate the customer journey, delivering answers to the business and a superior customer experience.

68
article thumbnail

New As-a-Service Offers on Vantage Bring Simplicity, Modernization

Teradata

Analytics as a service lets you offload IT infrastructure tasks so you can focus on solving your toughest business problems. Learn more about options for Teradata Vantage.

IT 61
article thumbnail

The Data Lake is Dead; Long Live the Data Lake!

Teradata

Martin Wilcox examines the failure of data lakes.

Data Lake 102
article thumbnail

How Moving to the Cloud Helped Craft the Ideal Fan Experience for Ticketmaster

Teradata

Learn how moving to the cloud in 10 weeks enabled Ticketmaster to gain greater visibility into their data and respond to business needs quicker.

Cloud 60
article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.