Sat.May 01, 2021 - Fri.May 07, 2021

article thumbnail

Spark on Kubernetes – Gang Scheduling with YuniKorn

Cloudera

Apache YuniKorn (Incubating) has just released 0.10.0 ( release announcement ). As part of this release, a new feature called Gang Scheduling has become available. By leveraging the Gang Scheduling feature, Spark jobs scheduling on Kubernetes becomes more efficient. What is Apache YuniKorn (Incubating)? Apache YuniKorn (Incubating) is a new Apache incubator project that offers rich scheduling capabilities on Kubernetes.

Metadata 136
article thumbnail

Making Spark Cloud Native At Data Mechanics

Data Engineering Podcast

Summary Spark is one of the most well-known frameworks for data processing, whether for batch or streaming, ETL or ML, and at any scale. Because of its popularity it has been deployed on every kind of platform you can think of. In this episode Jean-Yves Stephan shares the work that he is doing at Data Mechanics to make it sing on Kubernetes. He explains how operating in a cloud-native context simplifies some aspects of running the system while complicating others, how it simplifies the developme

Cloud 100
article thumbnail

What’s the Secret Recipe for DataOps?

DataKitchen

Catalog & Cocktails podcast hosts Tim Gasper & Juan Sequeda of data.world interview DataKitchen CEO Chris Bergh on how to create the right DataOps culture & measuring the value of your DataOps strategy. The post What’s the Secret Recipe for DataOps? first appeared on DataKitchen.

98
article thumbnail

Confluent Update Regarding Codecov Incident

Confluent

Our team was recently notified of unauthorized read-only access to Confluent’s Github account stemming from the recent Codecov incident (more information here). The security of our customers and their data […].

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Quantifying the value of multi-cloud deployment strategies with CDP Public Cloud

Cloudera

In the introductory article of this series, I presented the overarching framework for quantifying the value of the Cloudera Data Platform (CDP): . In this article, I will be focusing on the contribution that a multi-cloud strategy has towards these value drivers, and address a question that I regularly get from clients: Is there a quantifiable benefit to a multi-cloud deployment?

Cloud 92
article thumbnail

The Grand Vision And Present Reality of DataOps

Data Engineering Podcast

Summary The Data industry is changing rapidly, and one of the most active areas of growth is automation of data workflows. Taking cues from the DevOps movement of the past decade data professionals are orienting around the concept of DataOps. More than just a collection of tools, there are a number of organizational and conceptual changes that a proper DataOps approach depends on.

More Trending

article thumbnail

How DataOps Enables a Data Fabric

DataKitchen

The post How DataOps Enables a Data Fabric first appeared on DataKitchen.

Data 66
article thumbnail

Streaming Market Data with Flink SQL Part I: Streaming VWAP

Cloudera

This article is the first of a multipart series to showcase the power and expressibility of FlinkSQL applied to market data. Code and data for this series are available on github. It was co-authored by Krishnen Vytelingum, Head of Quantitative Modeling, Simudyne. Speed matters in financial markets. Whether the goal is to maximize alpha or minimize exposure, financial technologists invest heavily in having the most up-to-date insights on the state of the market and where it is going.

SQL 85
article thumbnail

What Isaac Newton Did in Lockdown – And What it Tells Us About Data Science

Teradata

The end of the pandemic may well be in sight, but it’s highlighted the incredible power of data science to transform economies, industries & people’s lives for the better.

article thumbnail

Streaming ETL with Confluent: Routing and Fan-Out of Apache Kafka Messages with ksqlDB

Confluent

In the world of data engineering, data routing decisions are crucial to successful distributed system design. Some organizations choose to route data from within application code. Other teams hand off […].

Kafka 59
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Functional Collections in Scala

Rock the JVM

Discover a powerful Scala feature that many developers overlook: a concise guide to functional collections that could revolutionize your Scala programming

Scala 52
article thumbnail

Hardening Palantir’s Kubernetes Infrastructure with Cilium

Palantir

Containerized infrastructure has become an industry-wide trend as engineering teams lean on the likes of Docker or Kubernetes to manage, deploy, and scale their environments; here, Palantir is no exception. We built Rubix , Palantir’s Kubernetes infrastructure, with two primary goals in mind: streamlining and scaling the deployment of our software platforms and strengthening our security posture.

Bytes 52
article thumbnail

Welcome, Teal!

Grouparoo

We are excited to have Teal Larson come aboard Grouparoo as an engineer. Teal has already started working on our www site, building out pages that help communicate what we are building and for whom. We have doubled our Pacific Northwest cohort. I think that means that we will have to plan a trip up there for a hiking offsite. The first thing I noticed about Teal was her time outside of tech as a language arts teacher.

article thumbnail

Announcing the MongoDB Atlas Sink and Source Connectors in Confluent Cloud

Confluent

Today, Confluent is announcing the general availability (GA) of the fully managed MongoDB Atlas Source and MongoDB Atlas Sink Connectors within Confluent Cloud. Now, with just a few simple clicks, […].

MongoDB 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

What Isaac Newton Did in Lockdown – And What it Tells Us About Data Science

Teradata

The end of the pandemic may well be in sight, but it’s highlighted the incredible power of data science to transform economies, industries & people’s lives for the better.

article thumbnail

Working with Mixed Data Types within a Field Using Rockset

Rockset

So. you think all your data in a particular field are a string type, but when you try to run your query, you get some errors. Doing more investigation, it looks like you have some int and undefined types as well. Bummer. Despair not! We can actually work around this (without data prep ?). To recap, in our first blog, we created an integration with MongoDB on Rockset, so Rockset can read and [update] the data coming in MongoDB.

MongoDB 52
article thumbnail

Using Sync Modes in Grouparoo

Grouparoo

We've improved the Getting Started Experience! Check out our UI Configuration method. The steps utilizing grouparoo generate will not be replicable as the command will be fully deprecated in v0.8.1 A few weeks ago we wrote about Sync Modes and why they may be useful when it comes to syncing data to a destination. In short, Sync Modes allow you to have more control over what operations are performed and how Grouparoo interacts with contacts that may already exist in the destination system.

article thumbnail

Cloud Migration Series (Step 2 of 5): Start Planning

Cloud Academy

This is part 2 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success. Be sure to subscribe to our blog to be notified when new content goes live!

Cloud 40
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The Race to Transform

Teradata

For banks, the essential elements of survival include not only a comprehensive data strategy that drives real return, but also cultural and organizational changes.

Banking 52
article thumbnail

How to Conduct Data Incident Management for Data Teams

Monte Carlo

As data systems become increasingly distributed and companies ingest more and more data, the opportunity for error (and incidents) only increases. For decades, software engineering teams have relied on a multi-step process to identify, triage, resolve, and prevent issues from taking down their applications. As data operations mature, it’s time we treat data downtime , in other words, periods of time when data is missing, inaccurate, or otherwise erroneous, with the same diligence, particularly w

article thumbnail

Which Open Source Data Integration Tool is Best?

Preset

Airbyte and Meltano are the two leading open-source data integration platforms. In this post, we'll showcase the strengths of both platforms.

article thumbnail

Driving Agility and Scalability through Smart Data

Cloudera

Last year presented business and organizational challenges that hadn’t been seen in a century and the troubling fact is that the challenges applied pains and gains unequally across industry segments. While brick-and-mortar retail was crushed a year ago with mandated store closures, digital commerce retailers realized ten years of digital sales penetration in only three months.

Scala 105
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.