Top Data Engineering Digest PostgreSQL Coding Content for April, 2021

April, 2021

Flipr: Making Changes Quickly and Safely at Scale

Uber Engineering

APRIL 12, 2021

Introduction. Uber’s many software systems require a high volume of changes every day. Because of our systems’ size and complexity, it is a significant challenge to implement these changes without unintended consequences, ultimately slowing down developer productivity. Flipr is a … The post Flipr: Making Changes Quickly and Safely at Scale appeared first on Uber Engineering Blog.

Engineering

Engineering Systems IT Architecture

What’s New in Apache Kafka 2.8

Confluent

APRIL 19, 2021

I’m proud to announce the release of Apache Kafka 2.8.0 on behalf of the Apache Kafka® community. The 2.8.0 release contains many new features and improvements. This blog post highlights […].

Kafka

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Waitingforcode

Writing memory efficient data pipelines in Python

Start Data Engineering

APRIL 26, 2021

Introduction 1. Using generators Using generator expression Using generator yield Mini batching Reading in batches from a database Pros & Cons 2. Using distributed frameworks Pros & Cons Conclusion Further reading References Introduction If you are Wondering how to write memory efficient data pipelines in python Working with a dataset that is too large to fit into memory Then this post is for you.

Data Pipeline

Data Pipeline Python Datasets Database

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Drinking our own champagne – Cloudera upgrades to CDP Private Cloud

Cloudera

APRIL 21, 2021

Like most of our customers, Cloudera’s internal operations rely heavily on data. For more than a decade, Cloudera has built internal tools and data analysis primarily on a single production CDH cluster. This cluster runs workloads for every department – from real-time user interfaces for Support to providing recommendations in the Cloudera Data Platform (CDP) Upgrade Advisor to analyzing our business and closing our books.

Cloud

Cloud Professional Services Java Data Warehouse

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Download the 2021 DataOps Vendor Landscape here. Read the complete blog below for a more detailed description of the vendors and their capabilities. DataOps is a hot topic in 2021. This is not surprising given that DataOps enables enterprise data teams to generate significant business value from their data. Companies that implement DataOps find that they are able to reduce cycle times from weeks (or months) to days, virtually eliminate data errors, increase collaboration, and dramatically imp

Consulting

Consulting Machine Learning Data Science Data Pipeline

Self Service Data Exploration And Dashboarding With Superset

Data Engineering Podcast

APRIL 26, 2021

Summary The reason for collecting, cleaning, and organizing data is to make it usable by the organization. One of the most common and widely used methods of access is through a business intelligence dashboard. Superset is an open source option that has been gaining popularity due to its flexibility and extensible feature set. In this episode Maxime Beauchemin discusses how data engineers can use Superset to provide self service access to data and deliver analytics.

Business Intelligence

Business Intelligence Data Warehouse Hadoop Data Pipeline

Making Customer Experience Your Competitive Advantage

Teradata

APRIL 28, 2021

Customers expect organizations to know them, provide relevant & personalized experiences, and be good stewards of their data. Yet many businesses still struggle with this. Why?

Data

More Trending

Making Customer Experience Your Competitive Advantage

Teradata

APRIL 28, 2021

Customers expect organizations to know them, provide relevant & personalized experiences, and be good stewards of their data. Yet many businesses still struggle with this. Why?

Data

How to Survive a Kafka Outage

Confluent

APRIL 27, 2021

There is a class of applications that cannot afford to be unavailable—for example, external-facing entry points into your organization. Typically, anything your customers interact with directly cannot go down. As […].

Kafka

Kafka Data Ingestion Data

How to gather requirements to re-engineer a legacy data pipeline

Start Data Engineering

APRIL 8, 2021

Introduction Gathering requirements 0. Understand the current state of the data pipeline 1. Think like the end user 2. Know the why 3. End user interviews 4. Reduce the scope 5. End user walkthrough for proposed solution 6. Timelines & deliverables Deliver iteratively Conclusion Further reading References Introduction As data engineers, you will have to re-engineer legacy data pipelines.

Data Pipeline

Data Pipeline Engineering Data Engineering Data Engineer

Relationship intelligence will shape the workplace of the future

Cloudera

APRIL 23, 2021

Our latest Influential Women in Data session featured Brenda Le Sueur from Cambridge Assessments. Brenda has worked across many organisations and continents, but what has always been crucial to her is relationships – how we cultivate them, how we nurture them and how they, in turn, define us. I sat down with Brenda to ask her about her journey as a woman in tech and understand more about the impact of relationships on our career.

Technology

Technology Building Coding Project

DataOps Enables Your Data Fabric

DataKitchen

APRIL 28, 2021

Industry analysts who follow the data and analytics industry tell DataKitchen that they are receiving inquiries about “data fabrics” from enterprise clients on a near-daily basis. Forrester relates that out of 25,000 reports published by the firm last year, the report on data fabrics and DataOps ranked in the top ten for downloads in 2020. Gartner included data fabrics in their top ten trends for data and analytics in 2019.

Data Pipeline

Data Pipeline Data Data Analytics Architecture

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Moving Machine Learning Into The Data Pipeline at Cherre

Data Engineering Podcast

APRIL 19, 2021

Summary Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Sometimes, however, one of those transformations is actually a full-fledged machine learning project in its own right. In this episode Tal Galfsky explains how he and the team at Cherre tackled the problem of messy data for Addresses by building a natural language processing and entity resolution system that is served

Data Pipeline

Data Pipeline Machine Learning Data Warehouse Datasets

Meet the New Analytics Superhero - The CFO

Teradata

APRIL 4, 2021

The CFO’s broad remit & natural ownership of core financial data can provide the foundation for an enhanced role that leverages data analytics to enable new value opportunities.

Data Analytics

Data Analytics Data

Debuting a Modern C++ API for Apache Kafka

Confluent

APRIL 13, 2021

Morgan Stanley uses Apache Kafka® to publish market data to internal clients and to persist it for replay purposes. We started out using librdkafka’s C++ API, which maintains C++98 compatibility. […].

Kafka

Kafka IT Data

How Airbnb Achieved Metric Consistency at Scale

Airbnb Tech

APRIL 30, 2021

Part-I: Introducing Minerva — Airbnb’s Metric Platform By : Amit Pahwa , Cristian Figueroa , Donghan Zhang , Haim Grosman , John Bodley , Jonathan Parks , Maggie Zhu , Philip Weiss , Robert Chang , Shao Xie , Sylvia Tomiyama , Xiaohui Sun Data is the voice of our users at scale. In the midst of the COVID-19 pandemic, we saw that travel with Airbnb has become hyper-local.

Data Warehouse

Data Warehouse Finance Metadata Aggregated Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Apache Ozone and Dense Data Nodes

Cloudera

APRIL 22, 2021

This post was co-authored by two Cisco Employees as well: Karthik Krishna, Silesh Bijjahalli. Today’s enterprise data analytics teams are constantly looking to get the best out of their platforms. Storage plays one of the most important roles in the data platforms strategy, it provides the basis for all compute engines and applications to be built on top of it.

Pipeline-centric

Pipeline-centric Data Lake Hadoop Metadata

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

APRIL 26, 2021

In early April 2021, DataKItchen sat down with Jonathan Hodges, VP Data Management & Analytics, at Workiva ; Chuck Smith, VP of R&D Data Strategy at GlaxoSmithKline (GSK) ; and Chris Bergh, CEO and Head Chef at DataKitchen, to find out about their enterprise DataOps transformation journey, including key successes and lessons learned. You can listen to the entire conversation here or read the summary below.

Software Engineer

Software Engineer Software Engineering Machine Learning Education

Exploring The Expanding Landscape Of Data Professions with Josh Benamram of Databand

Data Engineering Podcast

APRIL 12, 2021

Summary "Business as usual" is changing, with more companies investing in data as a first class concern. As a result, the data team is growing and introducing more specialized roles. In this episode Josh Benamram, CEO and co-founder of Databand, describes the motivations for these emerging roles, how these positions affect the team dynamics, and the types of visibility that they need into the data platform to do their jobs effectively.

Data Warehouse

Data Warehouse Data Pipeline BI Data Engineering

Data.What? Data Democratization and the Illusion of Self-Service

Teradata

APRIL 2, 2021

The concepts and processes surrounding self-service analytics sound easy. So why does the illusion of self-service rarely translate to reality? Find out more.

Data

Data Process

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineer

Building the Confluent UI with React Hooks – Benefits and Lessons Learned

Confluent

APRIL 14, 2021

Updating a fundamental paradigm in your React app can be as easy as search and replace, or at other times, as difficult as convincing your entire frontend engineering to buy […].

Building

Building Engineering

7 Things that Make SQLite Unique and Awesome

Grouparoo

APRIL 29, 2021

I became very close with SQLite in the few weeks it took me to build out Grouparoo's SQLite plugin. Through that process I came to find that SQLite is not like the others. It has a handful of quirks, caveats, and gotchas when compared to other databases like MySQL and PostgreSQL. Here are seven of those quirks that I find most interesting: 1. SQLite is serverless SQLite doesn't require a separate process to run, as other databases do.

PostgreSQL

PostgreSQL MySQL Database SQL

#ClouderaLife Spotlight: Suzy Tonini, Talent Researcher

Cloudera

APRIL 28, 2021

As we continue to work toward diversity, equality, and inclusion in every aspect of our company culture and beyond, we’ve learned so much from our employees’ unique perspectives on allyship. One such employee is Suzy Tonini, a Talent Researcher with a globe-trotting childhood. Growing up with parents who worked for the U.S. State Department, Suzy had the opportunity to hop from country to country with her family, experiencing a variety of cultures. .

Education

Education Building IT

10 Upcoming Data Science Platforms for Massive Disruption

DataKitchen

APRIL 13, 2021

The post 10 Upcoming Data Science Platforms for Massive Disruption first appeared on DataKitchen.

Data Science

Data Science Data

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

APRIL 5, 2021

Summary One of the biggest obstacles to success in delivering data products is cross-team collaboration. Part of the problem is the difference in the information that each role requires to do their job and where they expect to find it. This introduces a barrier to communication that is difficult to overcome, particularly in teams that have not reached a significant level of maturity in their data journey.

Data Warehouse

Data Warehouse Data Pipeline BI Metadata

Monte Carlo and Snowflake partner to help organizations achieve more trustworthy data

Monte Carlo

APRIL 29, 2021

Monte Carlo, the data reliability company, today announced a partnership with Snowflake , the Data Cloud company, to help data teams trust their data and accelerate the adoption of analytics in the Data Cloud. This combination can provide Snowflake customers with end-to-end Data Observability across their entire Snowflake Data Cloud, from ingestion to analytics.

Insurance

Insurance Business Intelligence Data Pipeline Cloud

Announcing ksqlDB 0.17.0

Confluent

APRIL 26, 2021

We’re excited to announce ksqlDB 0.17, a big release for 2021. This version adds support for managing the lifecycle of your queries from CI servers, a first-class timestamp data type, […].

Management

Management Data Process

Cats Effect 3: Racing IOs Explained

Rock the JVM

APRIL 28, 2021

Following the introduction to concurrency in Cats Effect: explore advanced techniques for managing racing IOs and fibers

Management

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

Next Stop – Predicting on Data with Cloudera Machine Learning

Cloudera

APRIL 9, 2021

This is part 4 in this blog series. You can read part 1 here and part 2 here , and watch part 3 here. This blog series follows the manufacturing and operations data lifecycle stages of an electric car manufacturer – typically experienced in large, data-driven manufacturing companies. The first blog introduced a mock vehicle manufacturing company, The Electric Car Company (ECC) and focused on Data Collection.

Machine Learning

Machine Learning Manufacturing Data Collection Data Science

DevOps and agile still hindered by enterprise silos, inertia

DataKitchen

APRIL 6, 2021

The post DevOps and agile still hindered by enterprise silos, inertia first appeared on DataKitchen.

Apple Migration Tips for M1 Macs

Grouparoo

APRIL 27, 2021

Last week, I upgraded to a M1 Macbook Pro. I got it configured for development and 48 hours later, through a series of unfortunate events and hardware failure, I ended up with a second M1 Macbook Pro instead. The transition between computers wasn’t too bad thanks to Apple’s Migration Assistant. I ran into an interesting situation, though. About 90% of the migration worked as expected or better, but the other 10% presented some puzzling blockers.

Programming

Programming Database Coding IT

CFO Analytics – Machine Learning

Teradata

APRIL 25, 2021

Once data issues are solved, we can focus on driving value leveraging analytics. Machine learning is a hot topic with CFOs today, but is it the right tool for CFO Analytics?

Machine Learning

Machine Learning IT Data

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

April, 2021

Flipr: Making Changes Quickly and Safely at Scale

What’s New in Apache Kafka 2.8

Webinars

Trending Sources

Writing memory efficient data pipelines in Python

Webinars

Drinking our own champagne – Cloudera upgrades to CDP Private Cloud

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

The DataOps Vendor Landscape, 2021

Self Service Data Exploration And Dashboarding With Superset

Making Customer Experience Your Competitive Advantage

Sign up to get articles personalized to your interests!

More Trending

Making Customer Experience Your Competitive Advantage

How to Survive a Kafka Outage

How to gather requirements to re-engineer a legacy data pipeline

Relationship intelligence will shape the workplace of the future

DataOps Enables Your Data Fabric

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Moving Machine Learning Into The Data Pipeline at Cherre

Meet the New Analytics Superhero - The CFO

Debuting a Modern C++ API for Apache Kafka

How Airbnb Achieved Metric Consistency at Scale

How to Modernize Manufacturing Without Losing Control

Apache Ozone and Dense Data Nodes

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

Exploring The Expanding Landscape Of Data Professions with Josh Benamram of Databand

Data.What? Data Democratization and the Illusion of Self-Service

The Ultimate Guide to Apache Airflow DAGS

Building the Confluent UI with React Hooks – Benefits and Lessons Learned

7 Things that Make SQLite Unique and Awesome

#ClouderaLife Spotlight: Suzy Tonini, Talent Researcher

10 Upcoming Data Science Platforms for Massive Disruption

Optimizing The Modern Developer Experience with Coder

Put Your Whole Data Team On The Same Page With Atlan

Monte Carlo and Snowflake partner to help organizations achieve more trustworthy data

Announcing ksqlDB 0.17.0

Cats Effect 3: Racing IOs Explained

15 Modern Use Cases for Enterprise Business Intelligence

Next Stop – Predicting on Data with Cloudera Machine Learning

DevOps and agile still hindered by enterprise silos, inertia

Apple Migration Tips for M1 Macs

CFO Analytics – Machine Learning

Apache Airflow® Best Practices: DAG Writing

Stay Connected