This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Apache YuniKorn (Incubating) has just released 0.10.0 ( release announcement ). As part of this release, a new feature called Gang Scheduling has become available. By leveraging the Gang Scheduling feature, Spark jobs scheduling on Kubernetes becomes more efficient. What is Apache YuniKorn (Incubating)? Apache YuniKorn (Incubating) is a new Apache incubator project that offers rich scheduling capabilities on Kubernetes.
Summary Spark is one of the most well-known frameworks for data processing, whether for batch or streaming, ETL or ML, and at any scale. Because of its popularity it has been deployed on every kind of platform you can think of. In this episode Jean-Yves Stephan shares the work that he is doing at Data Mechanics to make it sing on Kubernetes. He explains how operating in a cloud-native context simplifies some aspects of running the system while complicating others, how it simplifies the developme
Catalog & Cocktails podcast hosts Tim Gasper & Juan Sequeda of data.world interview DataKitchen CEO Chris Bergh on how to create the right DataOps culture & measuring the value of your DataOps strategy. The post What’s the Secret Recipe for DataOps? first appeared on DataKitchen.
Our team was recently notified of unauthorized read-only access to Confluent’s Github account stemming from the recent Codecov incident (more information here). The security of our customers and their data […].
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Last year presented business and organizational challenges that hadn’t been seen in a century and the troubling fact is that the challenges applied pains and gains unequally across industry segments. While brick-and-mortar retail was crushed a year ago with mandated store closures, digital commerce retailers realized ten years of digital sales penetration in only three months.
Summary The Data industry is changing rapidly, and one of the most active areas of growth is automation of data workflows. Taking cues from the DevOps movement of the past decade data professionals are orienting around the concept of DataOps. More than just a collection of tools, there are a number of organizational and conceptual changes that a proper DataOps approach depends on.
A file and folder interface for Netflix Cloud Services Written by Vikram Krishnamurthy , Kishore Kasi , Abhishek Kapatkar , and Tejas Chopra In this post, we are introducing Netflix Drive, a Cloud drive for media assets and providing a high level overview of some of its features and interfaces. We intend this to be a first post in a series of posts covering Netflix Drive.
A file and folder interface for Netflix Cloud Services Written by Vikram Krishnamurthy , Kishore Kasi , Abhishek Kapatkar , and Tejas Chopra In this post, we are introducing Netflix Drive, a Cloud drive for media assets and providing a high level overview of some of its features and interfaces. We intend this to be a first post in a series of posts covering Netflix Drive.
In the introductory article of this series, I presented the overarching framework for quantifying the value of the Cloudera Data Platform (CDP): . In this article, I will be focusing on the contribution that a multi-cloud strategy has towards these value drivers, and address a question that I regularly get from clients: Is there a quantifiable benefit to a multi-cloud deployment?
The end of the pandemic may well be in sight, but it’s highlighted the incredible power of data science to transform economies, industries & people’s lives for the better.
In the world of data engineering, data routing decisions are crucial to successful distributed system design. Some organizations choose to route data from within application code. Other teams hand off […].
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Discover a powerful Scala feature that many developers overlook: a concise guide to functional collections that could revolutionize your Scala programming
This article is the first of a multipart series to showcase the power and expressibility of FlinkSQL applied to market data. Code and data for this series are available on github. It was co-authored by Krishnen Vytelingum, Head of Quantitative Modeling, Simudyne. Speed matters in financial markets. Whether the goal is to maximize alpha or minimize exposure, financial technologists invest heavily in having the most up-to-date insights on the state of the market and where it is going.
Containerized infrastructure has become an industry-wide trend as engineering teams lean on the likes of Docker or Kubernetes to manage, deploy, and scale their environments; here, Palantir is no exception. We built Rubix , Palantir’s Kubernetes infrastructure, with two primary goals in mind: streamlining and scaling the deployment of our software platforms and strengthening our security posture.
Today, Confluent is announcing the general availability (GA) of the fully managed MongoDB Atlas Source and MongoDB Atlas Sink Connectors within Confluent Cloud. Now, with just a few simple clicks, […].
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
So. you think all your data in a particular field are a string type, but when you try to run your query, you get some errors. Doing more investigation, it looks like you have some int and undefined types as well. Bummer. Despair not! We can actually work around this (without data prep ?). To recap, in our first blog, we created an integration with MongoDB on Rockset, so Rockset can read and [update] the data coming in MongoDB.
We are excited to have Teal Larson come aboard Grouparoo as an engineer. Teal has already started working on our www site, building out pages that help communicate what we are building and for whom. We have doubled our Pacific Northwest cohort. I think that means that we will have to plan a trip up there for a hiking offsite. The first thing I noticed about Teal was her time outside of tech as a language arts teacher.
The end of the pandemic may well be in sight, but it’s highlighted the incredible power of data science to transform economies, industries & people’s lives for the better.
This is part 2 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success. Be sure to subscribe to our blog to be notified when new content goes live!
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
As data systems become increasingly distributed and companies ingest more and more data, the opportunity for error (and incidents) only increases. For decades, software engineering teams have relied on a multi-step process to identify, triage, resolve, and prevent issues from taking down their applications. As data operations mature, it’s time we treat data downtime , in other words, periods of time when data is missing, inaccurate, or otherwise erroneous, with the same diligence, particularly w
We've improved the Getting Started Experience! Check out our UI Configuration method. The steps utilizing grouparoo generate will not be replicable as the command will be fully deprecated in v0.8.1 A few weeks ago we wrote about Sync Modes and why they may be useful when it comes to syncing data to a destination. In short, Sync Modes allow you to have more control over what operations are performed and how Grouparoo interacts with contacts that may already exist in the destination system.
For banks, the essential elements of survival include not only a comprehensive data strategy that drives real return, but also cultural and organizational changes.
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content