June, 2019

article thumbnail

What’s New in Apache Kafka 2.3

Confluent

It’s official: Apache Kafka ® 2.3 has been released! Here is a selection of some of the most interesting and important features we added in the new release. Core Kafka. KIP-351 and KIP-427: Improved monitoring for partitions which have lost replicas. In order to keep your data safe, Kafka creates several replicas of it on different brokers. Kafka will not allow writes to proceed unless the partition has a minimum number of in-sync replicas.

Kafka 111
article thumbnail

Why Hadoop Failed and Where We Go from Here

Teradata

Chad Meley delves into the demise of Hadoop distribution vendors and how they got there.

Hadoop 110
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Predictive CPU isolation of containers at Netflix

Netflix Tech

By Benoit Rostykus, Gabriel Hartmann Noisy Neighbors We’ve all had noisy neighbors at one point in our life. Whether it’s at a cafe or through a wall of an apartment, it is always disruptive. The need for good manners in shared spaces turns out to be important not just for people, but for your Docker containers too. When you’re running in the cloud your containers are in a shared space; in particular they share the CPU’s memory hierarchy of the host instance.

article thumbnail

The Workflow Engine For Data Engineers And Data Scientists

Data Engineering Podcast

Summary Building a data platform that works equally well for data engineering and data science is a task that requires familiarity with the needs of both roles. Data engineering platforms have a strong focus on stateful execution and tasks that are strictly ordered based on dependency graphs. Data science platforms provide an environment that is conducive to rapid experimentation and iteration, with data flowing directly between stages.

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Cloudera Provides First Look at Cloudera Data Platform, the Industry’s First Enterprise Data Cloud

Cloudera

Cloudera Unveils Industry’s First Enterprise Data Cloud in Webinar. How do you take a mission-critical on-premises workload and rapidly burst it to the cloud? Can you instantly auto-scale resources as demand requires and just as easily pause your work so you don’t run up your cloud bill? On June 18th, Cloudera provided an exclusive preview of these capabilities, and more, with the introduction of Cloudera Data Platform (CDP), the industry’s first enterprise data cloud.

Cloud 87
article thumbnail

Should you have an ETL window in your Modern Data Warehouse?

Advancing Analytics: Data Engineering

Ah the ETL (Extract-Transform-Load) Window, the schedule by which the Business Intelligence developer sets their clock, the nail-biting nightly period during which the on-call support hopes their phone won’t ring. It’s a cornerstone of the data warehousing approach… and we shouldn’t have one. There, I said it. Hear me out – back in the on-premises days we had data loading processes that connect directly to our source system databases and perform huge data extract queries as the start of one long

More Trending

article thumbnail

What Working “at Scale” Really Means

Teradata

Rob Armstrong discusses the challenges of moving from a departmental solution to operational and production systems working at scale, and how Teradata Vantage can solve for them.

Systems 87
article thumbnail

Netflix Studio Hack Day?—?May 2019

Netflix Tech

Netflix Studio Hack Day ?—?May 2019 By Tom Richards , Carenina Garcia Motion , and Marlee Tart Hack Days are a big deal at Netflix. They’re a chance to bring together employees from all our different disciplines to explore new ideas and experiment with emerging technologies. For the most recent hack day, we channeled our creative energy towards our studio efforts.

Java 15
article thumbnail

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

Summary Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that data lakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics. Delta Lake is an open source, opinionated framework built on top of Spark for interacting with and maintaining data lake platforms that incorporates the lessons learned at DataBricks from countless cu

Data Lake 100
article thumbnail

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

Noisy Neighbors in Large, Multi-Tenant Clusters. The typical Cloudera Enterprise Data Hub Cluster starts with a few dozen nodes in the customer’s datacenter hosting a variety of distributed services. Once configured and secured, the cluster administrator (admin) gives access to a few individuals to onboard their workloads. Over time, workloads start processing more data, tenants start onboarding more workloads, and administrators (admins) start onboarding more tenants.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Building a SQL Development Environment for Messy, Semi-Structured Data

Rockset

Why build a new SQL development environment? We love SQL — our mission is to bring fast, real-time queries to messy, semi-structured real-world data and SQL is a core part of our effort. A SQL API allows our product to fit neatly into the stacks of our users without any workflow re-architecting. Our users can easily integrate Rockset with a multitude of existing tools for SQL development (e.g.

SQL 52
article thumbnail

Designing the.NET API for Apache Kafka

Confluent

Confluent’s clients for Apache Kafka ® recently passed a major milestone—the release of version 1.0. This has been a long time in the making. Magnus Edenhill first started developing librdkafka about seven years ago, later joining Confluent in the very early days to help foster the community of Kafka users outside the Java ecosystem. Since then, the clients team has been on a mission to build a set of high-quality librdkafka bindings for different languages (initially Python , Go , and.NET

Kafka 106
article thumbnail

How Teradata and Oxford Saïd are Modernizing Analytics for Academic Research

Teradata

Oxford and Teradata partner to modernize analytics for academic research, shape new bodies of research and find answers to pressing business challenges.

80
article thumbnail

Unlock the Value of Data Faster Through Modern Data Warehousing

Advancing Analytics: Data Engineering

Data has value – I think we’ve finally got to the point where most people agree on this. The problem we face is how long it takes to unlock that value, and it’s a frustration that most businesses I speak to are having. Let’s think about why this is. After the horror that was the “data silo” days, with clumps of data living in Access databases, Excel spreadsheets and isolated data stores, we’ve had a pretty good run with the classic Kimball data warehouse.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Managing The Machine Learning Lifecycle

Data Engineering Podcast

Summary Building a machine learning model can be difficult, but that is only half of the battle. Having a perfect model is only useful if you are able to get it into production. In this episode Stepan Pushkarev, founder of Hydrosphere, explains why deploying and maintaining machine learning projects in production is different from regular software projects and the challenges that they bring.

article thumbnail

Streaming Data from the Universe with Apache Kafka

Confluent

You might think that data collection in astronomy consists of a lone astronomer pointing a telescope at a single object in a static sky. While that may be true in some cases (I collected the data for my Ph.D. thesis this way), the field of astronomy is rapidly changing into a data-intensive science with real-time needs. Each night, large-scale astronomical telescope surveys detect millions of changing objects in the sky and need to stream results to scientists for time-sensitive, complementary f

Kafka 102
article thumbnail

Evolving An ETL Pipeline For Better Productivity

Data Engineering Podcast

Summary Building an ETL pipeline can be a significant undertaking, and sometimes it needs to be rebuilt when a better option becomes available. In this episode Aaron Gibralter, director of engineering at Greenhouse, joins Raghu Murthy, founder and CEO of DataCoral, to discuss the journey that he and his team took from an in-house ETL pipeline built out of open source components onto a paid service.

Media 100
article thumbnail

Building a Scalable Search Architecture

Confluent

Software projects of all sizes and complexities have a common challenge: building a scalable solution for search. Who has never seen an application use RDBMS SQL statements to run searches? You might be wondering, is this a good solution? As the databases professor at my university used to say, it depends. Using SQL to run your search might be enough for your use case, but as your project requirements grow and more advanced features are needed—for example, enabling synonyms, multilingual search,

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

How to Connect KSQL to Confluent Cloud using Kubernetes with Helm

Confluent

Confluent Cloud, a fully managed event cloud-native streaming service that extends the value of Apache Kafka ® , is simple, resilient, secure, and performant, allowing you to focus on what is important—building contextual event-driven applications, not infrastructure. If you are using Confluent Cloud as your managed Apache Kafka cluster, you probably also want to start using other Confluent Platform components like the Confluent Schema Registry, Kafka Connect, KSQL, and Confluent REST Proxy.

Cloud 94
article thumbnail

Reliable, Fast Access to On-Chain Data Insights

Confluent

At TokenAnalyst , we are building the core infrastructure to integrate, clean, and analyze blockchain data. Data on a blockchain is also known as on-chain data. We offer both historical and low-latency data streams of on-chain data across multiple blockchains. How we use Apache Kafka and the Confluent Platform. Apache Kafka ® is the central data hub of our company.

article thumbnail

Spring for Apache Kafka Deep Dive – Part 4: Continuous Delivery of Event Streaming Pipelines

Confluent

For event streaming application developers, it is important to continuously update the streaming pipeline based on the need for changes in the individual applications in the pipeline. It is also important to understand some of the common streaming topologies that streaming developers use to build an event streaming pipeline. Here in part 4 of the Spring for Apache Kafka Deep Dive blog series, we will cover: Common event streaming topology patterns supported in Spring Cloud Data Flow.

Kafka 86
article thumbnail

AI for Industrials: Why is it different?

Teradata

Cheryl Wiebe examines the challenges of using AI in industrial situations.

IT 75
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Four Reasons Why Upgrading to Vantage is Worth It

Teradata

Running older Teradata analytics software versions may not support the latest innovations of Vantage and could cost you more than upgrading. Learn more.

IT 75
article thumbnail

Why Vantage Is Our Most Popular Release Ever

Teradata

Teradata Vantage is busting through analytic silos and raising the bar. Find out what drove these innovations and led to Vantage becoming our most popular release yet.

75
article thumbnail

Swedbank Delivers Superior Customer Experience by Illuminating the Customer Journey

Teradata

Find out how Swedbank has partnered with Teradata to illuminate the customer journey, delivering answers to the business and a superior customer experience.

68
article thumbnail

What Tableau Customers Should Expect Post-Salesforce Acquisition

Teradata

Chad Meley examines how Salesforce's acquisition of Tableau will impact customer choice and flexibility.

65
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

New As-a-Service Offers on Vantage Bring Simplicity, Modernization

Teradata

Analytics as a service lets you offload IT infrastructure tasks so you can focus on solving your toughest business problems. Learn more about options for Teradata Vantage.

IT 61
article thumbnail

How Moving to the Cloud Helped Craft the Ideal Fan Experience for Ticketmaster

Teradata

Learn how moving to the cloud in 10 weeks enabled Ticketmaster to gain greater visibility into their data and respond to business needs quicker.

Cloud 60
article thumbnail

Modern Data Warehousing with Azure Databricks at the #PASSSummit in Seattle

Advancing Analytics: Data Engineering

Hey everyone, Advancing Analytics are heading to Seattle in November for the PASS Summit. We will be delivering a full day training day on Azure Databricks - Practical Azure Databricks: Engineering & Warehousing at Scale. The session will focus on using Azure Databricks for Modern Data Warehousing. Not sure if the day is for you? Well take a look at the video we recorded.

article thumbnail

How We Use RocksDB at Rockset

Rockset

In this blog post, I'll describe how we use RocksDB at Rockset and how we tuned it to get the most performance out of it. I assume that the reader is generally familiar with how Log-Structured Merge tree based storage engines like RocksDB work. At Rockset, we want our users to be able to continuously ingest their data into Rockset with sub-second write latency and query it in 10s of milliseconds.

Bytes 40
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you