Sat.Oct 10, 2020 - Fri.Oct 16, 2020

article thumbnail

Top 5 Things Every Kafka Developer Should Know

Confluent

Apache Kafka® is an event streaming platform used by more than 30% of the Fortune 500 today. There are numerous features of Kafka that make it the de-facto standard for […].

Kafka 145
article thumbnail

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Cloudera

We are thrilled to announce that Cloudera has acquired Eventador , a provider of cloud-native services for enterprise-grade stream processing. Eventador, based in Austin, TX, was founded by Erik Beebe and Kenny Gorman in 2016 to address a fundamental business problem – make it simpler to build streaming applications built on real-time data. This typically involved a lot of coding with Java, Scala or similar technologies.

Cloud 132
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

How to submit Spark jobs to EMR cluster from Airflow

Start Data Engineering

Table of Contents Table of Contents Introduction Design Setup Prerequisites Clone repository Get data Code Move data and script to the cloud create an EMR cluster add steps and wait to complete terminate EMR cluster Run the DAG Conclusion Further reading Introduction I have been asked and seen the questions how others are automating apache spark jobs on EMR how to submit spark jobs to an EMR cluster from Airflow ?

Cloud 130
article thumbnail

Data: The Crumbling Foundation of Finance, Our Once Trusted Advisor

Teradata

The most frequently asked question of Finance departments today is, ‘whose data do we trust’? Here’s how to ensure Finance always has the correct answer.

Finance 117
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

How Real-Time Materialized Views Work with ksqlDB, Animated

Confluent

All around the world, companies are asking the same question: What is happening right now? We are inundated with pieces of data that have a fragment of the answer. But […].

Data 124
article thumbnail

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Cloudera

Background. Why choose K8s for Apache Spark. Apache Spark unifies batch processing, real-time processing, stream analytics, machine learning, and interactive query in one-platform. While Apache Spark provides a lot of capabilities to support diversified use cases, it comes with additional complexity and high maintenance costs for cluster administrators.

Big Data 118

More Trending

article thumbnail

Why the Single Source of Truth Paradigm in Data Warehousing is Outdated

Teradata

The old paradigm of the data warehouse serving as the single source of truth in today's ever evolving data landscape can no longer be sustained. Find out why.

article thumbnail

Cloud-Like Flexibility and Infinite Storage with Confluent Tiered Storage and FlashBlade from Pure Storage

Confluent

With the release of Confluent Platform 6.0, we officially made Tiered Storage generally available. At launch, we supported two major cloud-specific object stores: Amazon S3 and Google Cloud Storage. Today, […].

Cloud 68
article thumbnail

What you need to know to begin your journey to CDP

Cloudera

Recently, my colleague published a blog build on your investment by Migrating or Upgrading to CDP Data Center , which articulates great CDP Private Cloud Base features. Existing CDH and HDP customers can immediately benefit from this new functionality. This blog focuses on the process to accelerate your CDP journey to CDP Private Cloud Base for both professional services engagements and self-service upgrades.

article thumbnail

Fullstack Typescript - create an API

Grouparoo

Two of the major components of the @grouparoo/core application are a Node.js API server and a React frontend. We use Actionhero as the API server, and Next.JS for our React site generator. As we develop the Grouparoo application, we are constantly adding new API endpoints and changing existing ones. One of the great features of Typescript is that it can help not only to share type definitions within a codebase, but also across multiple codebases or services.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

How to Prioritize "Self" in Today's World: A Summary on Mental Health

Teradata

In honor of World Mental Health Day this past weekend, Shehzeen Rehman writes on the importance of de-stigmatizing mental health and learning how to seek help.

98
article thumbnail

Broadcast Joins in Apache Spark: An Optimization Technique

Rock the JVM

Broadcast joins in Apache Spark are a highly effective technique for boosting performance and avoiding memory issues, offering great value for optimization

52
article thumbnail

Using Cloudera Machine Learning to Build a Predictive Maintenance Model for Jet Engines

Cloudera

Introduction. Running a large commercial airline requires the complex management of critical components, including fuel futures contracts, aircraft maintenance and customer expectations. Airlines, in just the U.S. alone, average about 45,000 daily flights and transporting over 10 million passengers a year (source: FAA ). Airlines typically operate on very thin margins, and any schedule delay immediately angers or frustrates customers.

article thumbnail

How TypeScript's `any` creates bugs

Grouparoo

What is any ? If you're working with TypeScript, chances are you'll work with the any type. any essentially turns off typechecking, and allows the corresponding variable to be used for anything. You can call any methods on an any variable, and they'll all return any as well. It's great when you can't write types for everything in your codebase. let obj : any = { x : 0 } ; // None of these lines of code are errors const foo : any = obj. foo ( ) ; obj ( ) ; obj. bar = 100

Coding 40
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Watch Out for Gotchas in Cloud Data Warehouse Pricing

Teradata

Successful companies need to squeeze maximum value from all of their data & do it at the lowest possible cost. But they often get hit with unexpected budget overruns. Teradata can help.

article thumbnail

ALL the Joins in Spark DataFrames

Rock the JVM

Spark supports more types of table joins than you might expect: discover the different join options in this article

52
article thumbnail

Happy Birthday, CDP Public Cloud

Cloudera

On September 24, 2019, Cloudera launched CDP Public Cloud (CDP-PC) as the first step in delivering the industry’s first Enterprise Data Cloud. That Was Then. In the beginning, CDP ran only on AWS with a set of services that supported a handful of use cases and workload types: CDP Data Warehouse: a kubernetes-based service that allows business analysts to deploy data warehouses with secure, self-service access to enterprise data.

Cloud 97
article thumbnail

DHL Express

Teradata

Data analytics allow DHL Express to better understand critical business insights like logistics, revenue, profit, and yield management and optimize revenues.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

How-to: Index Data from S3 via NiFi Using CDP Data Hubs

Cloudera

About this Blog. Data Discovery and Exploration (DDE) was recently released in tech preview in Cloudera Data Platform in public cloud. In this blog we will go through the process of indexing data from S3 into Solr in DDE with the help of NiFi in Data Flow. The scenario is the same as it was in the previous blog but the ingest pipeline differs. Spark as the ingest pipeline tool for Search (i.e.

AWS 118