April, 2021

article thumbnail

Flipr: Making Changes Quickly and Safely at Scale

Uber Engineering

Introduction. Uber’s many software systems require a high volume of changes every day. Because of our systems’ size and complexity, it is a significant challenge to implement these changes without unintended consequences, ultimately slowing down developer productivity. Flipr is a … The post Flipr: Making Changes Quickly and Safely at Scale appeared first on Uber Engineering Blog.

article thumbnail

What’s New in Apache Kafka 2.8

Confluent

I’m proud to announce the release of Apache Kafka 2.8.0 on behalf of the Apache Kafka® community. The 2.8.0 release contains many new features and improvements. This blog post highlights […].

Kafka 138
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Writing memory efficient data pipelines in Python

Start Data Engineering

Introduction 1. Using generators Using generator expression Using generator yield Mini batching Reading in batches from a database Pros & Cons 2. Using distributed frameworks Pros & Cons Conclusion Further reading References Introduction If you are Wondering how to write memory efficient data pipelines in python Working with a dataset that is too large to fit into memory Then this post is for you.

article thumbnail

Drinking our own champagne – Cloudera upgrades to CDP Private Cloud

Cloudera

Like most of our customers, Cloudera’s internal operations rely heavily on data. For more than a decade, Cloudera has built internal tools and data analysis primarily on a single production CDH cluster. This cluster runs workloads for every department – from real-time user interfaces for Support to providing recommendations in the Cloudera Data Platform (CDP) Upgrade Advisor to analyzing our business and closing our books.

Cloud 121
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

The DataOps Vendor Landscape, 2021

DataKitchen

Download the 2021 DataOps Vendor Landscape here. Read the complete blog below for a more detailed description of the vendors and their capabilities. DataOps is a hot topic in 2021. This is not surprising given that DataOps enables enterprise data teams to generate significant business value from their data. Companies that implement DataOps find that they are able to reduce cycle times from weeks (or months) to days, virtually eliminate data errors, increase collaboration, and dramatically imp

article thumbnail

Self Service Data Exploration And Dashboarding With Superset

Data Engineering Podcast

Summary The reason for collecting, cleaning, and organizing data is to make it usable by the organization. One of the most common and widely used methods of access is through a business intelligence dashboard. Superset is an open source option that has been gaining popularity due to its flexibility and extensible feature set. In this episode Maxime Beauchemin discusses how data engineers can use Superset to provide self service access to data and deliver analytics.

More Trending

article thumbnail

How to Survive a Kafka Outage

Confluent

There is a class of applications that cannot afford to be unavailable—for example, external-facing entry points into your organization. Typically, anything your customers interact with directly cannot go down. As […].

Kafka 132
article thumbnail

How to gather requirements to re-engineer a legacy data pipeline

Start Data Engineering

Introduction Gathering requirements 0. Understand the current state of the data pipeline 1. Think like the end user 2. Know the why 3. End user interviews 4. Reduce the scope 5. End user walkthrough for proposed solution 6. Timelines & deliverables Deliver iteratively Conclusion Further reading References Introduction As data engineers, you will have to re-engineer legacy data pipelines.

article thumbnail

Relationship intelligence will shape the workplace of the future

Cloudera

Our latest Influential Women in Data session featured Brenda Le Sueur from Cambridge Assessments. Brenda has worked across many organisations and continents, but what has always been crucial to her is relationships – how we cultivate them, how we nurture them and how they, in turn, define us. I sat down with Brenda to ask her about her journey as a woman in tech and understand more about the impact of relationships on our career.

article thumbnail

DataOps Enables Your Data Fabric

DataKitchen

Industry analysts who follow the data and analytics industry tell DataKitchen that they are receiving inquiries about “data fabrics” from enterprise clients on a near-daily basis. Forrester relates that out of 25,000 reports published by the firm last year, the report on data fabrics and DataOps ranked in the top ten for downloads in 2020. Gartner included data fabrics in their top ten trends for data and analytics in 2019.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Moving Machine Learning Into The Data Pipeline at Cherre

Data Engineering Podcast

Summary Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Sometimes, however, one of those transformations is actually a full-fledged machine learning project in its own right. In this episode Tal Galfsky explains how he and the team at Cherre tackled the problem of messy data for Addresses by building a natural language processing and entity resolution system that is served

article thumbnail

Meet the New Analytics Superhero - The CFO

Teradata

The CFO’s broad remit & natural ownership of core financial data can provide the foundation for an enhanced role that leverages data analytics to enable new value opportunities.

article thumbnail

Debuting a Modern C++ API for Apache Kafka

Confluent

Morgan Stanley uses Apache Kafka® to publish market data to internal clients and to persist it for replay purposes. We started out using librdkafka’s C++ API, which maintains C++98 compatibility. […].

Kafka 124
article thumbnail

How Airbnb Achieved Metric Consistency at Scale

Airbnb Tech

Part-I: Introducing Minerva — Airbnb’s Metric Platform By : Amit Pahwa , Cristian Figueroa , Donghan Zhang , Haim Grosman , John Bodley , Jonathan Parks , Maggie Zhu , Philip Weiss , Robert Chang , Shao Xie , Sylvia Tomiyama , Xiaohui Sun Data is the voice of our users at scale. In the midst of the COVID-19 pandemic, we saw that travel with Airbnb has become hyper-local.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Apache Ozone and Dense Data Nodes

Cloudera

This post was co-authored by two Cisco Employees as well: Karthik Krishna, Silesh Bijjahalli. Today’s enterprise data analytics teams are constantly looking to get the best out of their platforms. Storage plays one of the most important roles in the data platforms strategy, it provides the basis for all compute engines and applications to be built on top of it.

article thumbnail

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

In early April 2021, DataKItchen sat down with Jonathan Hodges, VP Data Management & Analytics, at Workiva ; Chuck Smith, VP of R&D Data Strategy at GlaxoSmithKline (GSK) ; and Chris Bergh, CEO and Head Chef at DataKitchen, to find out about their enterprise DataOps transformation journey, including key successes and lessons learned. You can listen to the entire conversation here or read the summary below.

article thumbnail

Exploring The Expanding Landscape Of Data Professions with Josh Benamram of Databand

Data Engineering Podcast

Summary "Business as usual" is changing, with more companies investing in data as a first class concern. As a result, the data team is growing and introducing more specialized roles. In this episode Josh Benamram, CEO and co-founder of Databand, describes the motivations for these emerging roles, how these positions affect the team dynamics, and the types of visibility that they need into the data platform to do their jobs effectively.

article thumbnail

Data.What? Data Democratization and the Illusion of Self-Service

Teradata

The concepts and processes surrounding self-service analytics sound easy. So why does the illusion of self-service rarely translate to reality? Find out more.

Data 59
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Building the Confluent UI with React Hooks – Benefits and Lessons Learned

Confluent

Updating a fundamental paradigm in your React app can be as easy as search and replace, or at other times, as difficult as convincing your entire frontend engineering to buy […].

Building 123
article thumbnail

7 Things that Make SQLite Unique and Awesome

Grouparoo

I became very close with SQLite in the few weeks it took me to build out Grouparoo's SQLite plugin. Through that process I came to find that SQLite is not like the others. It has a handful of quirks, caveats, and gotchas when compared to other databases like MySQL and PostgreSQL. Here are seven of those quirks that I find most interesting: 1. SQLite is serverless SQLite doesn't require a separate process to run, as other databases do.

article thumbnail

#ClouderaLife Spotlight: Suzy Tonini, Talent Researcher

Cloudera

As we continue to work toward diversity, equality, and inclusion in every aspect of our company culture and beyond, we’ve learned so much from our employees’ unique perspectives on allyship. One such employee is Suzy Tonini, a Talent Researcher with a globe-trotting childhood. Growing up with parents who worked for the U.S. State Department, Suzy had the opportunity to hop from country to country with her family, experiencing a variety of cultures. .

article thumbnail

10 Upcoming Data Science Platforms for Massive Disruption

DataKitchen

The post 10 Upcoming Data Science Platforms for Massive Disruption first appeared on DataKitchen.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

Summary One of the biggest obstacles to success in delivering data products is cross-team collaboration. Part of the problem is the difference in the information that each role requires to do their job and where they expect to find it. This introduces a barrier to communication that is difficult to overcome, particularly in teams that have not reached a significant level of maturity in their data journey.

article thumbnail

Monte Carlo and Snowflake partner to help organizations achieve more trustworthy data

Monte Carlo

Monte Carlo, the data reliability company, today announced a partnership with Snowflake , the Data Cloud company, to help data teams trust their data and accelerate the adoption of analytics in the Data Cloud. This combination can provide Snowflake customers with end-to-end Data Observability across their entire Snowflake Data Cloud, from ingestion to analytics.

article thumbnail

Announcing ksqlDB 0.17.0

Confluent

We’re excited to announce ksqlDB 0.17, a big release for 2021. This version adds support for managing the lifecycle of your queries from CI servers, a first-class timestamp data type, […].

article thumbnail

Cats Effect 3: Racing IOs Explained

Rock the JVM

Following the introduction to concurrency in Cats Effect: explore advanced techniques for managing racing IOs and fibers

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Next Stop – Predicting on Data with Cloudera Machine Learning

Cloudera

This is part 4 in this blog series. You can read part 1 here and part 2 here , and watch part 3 here. This blog series follows the manufacturing and operations data lifecycle stages of an electric car manufacturer – typically experienced in large, data-driven manufacturing companies. The first blog introduced a mock vehicle manufacturing company, The Electric Car Company (ECC) and focused on Data Collection.

article thumbnail

DevOps and agile still hindered by enterprise silos, inertia

DataKitchen

The post DevOps and agile still hindered by enterprise silos, inertia first appeared on DataKitchen.

84
article thumbnail

Apple Migration Tips for M1 Macs

Grouparoo

Last week, I upgraded to a M1 Macbook Pro. I got it configured for development and 48 hours later, through a series of unfortunate events and hardware failure, I ended up with a second M1 Macbook Pro instead. The transition between computers wasn’t too bad thanks to Apple’s Migration Assistant. I ran into an interesting situation, though. About 90% of the migration worked as expected or better, but the other 10% presented some puzzling blockers.

article thumbnail

CFO Analytics – Machine Learning

Teradata

Once data issues are solved, we can focus on driving value leveraging analytics. Machine learning is a hot topic with CFOs today, but is it the right tool for CFO Analytics?

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!