Sat.Oct 10, 2020 - Fri.Oct 16, 2020

article thumbnail

Top 5 Things Every Kafka Developer Should Know

Confluent

Apache Kafka® is an event streaming platform used by more than 30% of the Fortune 500 today. There are numerous features of Kafka that make it the de-facto standard for […].

Kafka 145
article thumbnail

How to submit Spark jobs to EMR cluster from Airflow

Start Data Engineering

Table of Contents Table of Contents Introduction Design Setup Prerequisites Clone repository Get data Code Move data and script to the cloud create an EMR cluster add steps and wait to complete terminate EMR cluster Run the DAG Conclusion Further reading Introduction I have been asked and seen the questions how others are automating apache spark jobs on EMR how to submit spark jobs to an EMR cluster from Airflow ?

Cloud 130
article thumbnail

How-to: Index Data from S3 via NiFi Using CDP Data Hubs

Cloudera

About this Blog. Data Discovery and Exploration (DDE) was recently released in tech preview in Cloudera Data Platform in public cloud. In this blog we will go through the process of indexing data from S3 into Solr in DDE with the help of NiFi in Data Flow. The scenario is the same as it was in the previous blog but the ingest pipeline differs. Spark as the ingest pipeline tool for Search (i.e.

AWS 121
article thumbnail

Data: The Crumbling Foundation of Finance, Our Once Trusted Advisor

Teradata

The most frequently asked question of Finance departments today is, ‘whose data do we trust’? Here’s how to ensure Finance always has the correct answer.

Finance 117
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

How Real-Time Materialized Views Work with ksqlDB, Animated

Confluent

All around the world, companies are asking the same question: What is happening right now? We are inundated with pieces of data that have a fragment of the answer. But […].

Data 124
article thumbnail

Rapid Delivery Of Business Intelligence Using Power BI

Data Engineering Podcast

Summary Business intelligence efforts are only as useful as the outcomes that they inform. Power BI aims to reduce the time and effort required to go from information to action by providing an interface that encourages rapid iteration. In this episode Rob Collie shares his enthusiasm for the Power BI platform and how it stands out from other options.

More Trending

article thumbnail

Why the Single Source of Truth Paradigm in Data Warehousing is Outdated

Teradata

The old paradigm of the data warehouse serving as the single source of truth in today's ever evolving data landscape can no longer be sustained. Find out why.

article thumbnail

Cloud-Like Flexibility and Infinite Storage with Confluent Tiered Storage and FlashBlade from Pure Storage

Confluent

With the release of Confluent Platform 6.0, we officially made Tiered Storage generally available. At launch, we supported two major cloud-specific object stores: Amazon S3 and Google Cloud Storage. Today, […].

Cloud 71
article thumbnail

Fullstack Typescript - create an API

Grouparoo

Two of the major components of the @grouparoo/core application are a Node.js API server and a React frontend. We use Actionhero as the API server, and Next.JS for our React site generator. As we develop the Grouparoo application, we are constantly adding new API endpoints and changing existing ones. One of the great features of Typescript is that it can help not only to share type definitions within a codebase, but also across multiple codebases or services.

article thumbnail

What you need to know to begin your journey to CDP

Cloudera

Recently, my colleague published a blog build on your investment by Migrating or Upgrading to CDP Data Center , which articulates great CDP Private Cloud Base features. Existing CDH and HDP customers can immediately benefit from this new functionality. This blog focuses on the process to accelerate your CDP journey to CDP Private Cloud Base for both professional services engagements and self-service upgrades.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How to Prioritize "Self" in Today's World: A Summary on Mental Health

Teradata

In honor of World Mental Health Day this past weekend, Shehzeen Rehman writes on the importance of de-stigmatizing mental health and learning how to seek help.

98
article thumbnail

Broadcast Joins in Apache Spark: An Optimization Technique

Rock the JVM

Broadcast joins in Apache Spark are a highly effective technique for boosting performance and avoiding memory issues, offering great value for optimization

52
article thumbnail

How TypeScript's `any` creates bugs

Grouparoo

What is any ? If you're working with TypeScript, chances are you'll work with the any type. any essentially turns off typechecking, and allows the corresponding variable to be used for anything. You can call any methods on an any variable, and they'll all return any as well. It's great when you can't write types for everything in your codebase. let obj : any = { x : 0 } ; // None of these lines of code are errors const foo : any = obj. foo ( ) ; obj ( ) ; obj. bar = 100

Coding 40
article thumbnail

Happy Birthday, CDP Public Cloud

Cloudera

On September 24, 2019, Cloudera launched CDP Public Cloud (CDP-PC) as the first step in delivering the industry’s first Enterprise Data Cloud. That Was Then. In the beginning, CDP ran only on AWS with a set of services that supported a handful of use cases and workload types: CDP Data Warehouse: a kubernetes-based service that allows business analysts to deploy data warehouses with secure, self-service access to enterprise data.

Cloud 103
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Watch Out for Gotchas in Cloud Data Warehouse Pricing

Teradata

Successful companies need to squeeze maximum value from all of their data & do it at the lowest possible cost. But they often get hit with unexpected budget overruns. Teradata can help.

article thumbnail

ALL the Joins in Spark DataFrames

Rock the JVM

Spark supports more types of table joins than you might expect: discover the different join options in this article

52
article thumbnail

Using Cloudera Machine Learning to Build a Predictive Maintenance Model for Jet Engines

Cloudera

Introduction. Running a large commercial airline requires the complex management of critical components, including fuel futures contracts, aircraft maintenance and customer expectations. Airlines, in just the U.S. alone, average about 45,000 daily flights and transporting over 10 million passengers a year (source: FAA ). Airlines typically operate on very thin margins, and any schedule delay immediately angers or frustrates customers.

article thumbnail

DHL Express

Teradata

Data analytics allow DHL Express to better understand critical business insights like logistics, revenue, profit, and yield management and optimize revenues.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Cloudera

We are thrilled to announce that Cloudera has acquired Eventador , a provider of cloud-native services for enterprise-grade stream processing. Eventador, based in Austin, TX, was founded by Erik Beebe and Kenny Gorman in 2016 to address a fundamental business problem – make it simpler to build streaming applications built on real-time data. This typically involved a lot of coding with Java, Scala or similar technologies.

Cloud 132