article thumbnail

Building A Data Lake For The Database Administrator At Upsolver

Data Engineering Podcast

What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert. What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert.

Data Lake 100
article thumbnail

An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications

Data Engineering Podcast

With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Just connect it to your database/data warehouse/data lakehouse/whatever you’re using and let them do the rest.

article thumbnail

Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering

Data Engineering Podcast

With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs.

article thumbnail

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

Data Engineering Podcast

With real time alerts for problems in your databases, ETL pipelines, or data warehouse, and integrations with Slack, Pagerduty, and custom webhooks you can fix the errors before they become a problem. You monitor your website to make sure that you’re the first to know when something goes wrong, but what about your data?

Cloud 100
article thumbnail

Data News — Week 23.12

Christophe Blefari

LinkedIn team decided to migrate to a lambda architecture and got 94% uplift in performance. I don't have a lot to say except the fact that we are going in a future with a lot of databases choices. How fast is DuckDB really? — Georges, Fivetran CEO, ran a performance test to have metrics on DuckDB performance.

article thumbnail

Unified Streaming And Batch Pipelines At LinkedIn: Reducing Processing time by 94% with Apache Beam

LinkedIn Engineering

In the past, we often used lambda architecture for processing jobs, meaning that our developers used two different systems for batch and stream processing. This pipeline reads ProfileData; joins the data with sideTable and then applies a user defined function called Standardizer(); finally, writes the standardized result to databases.

Process 97
article thumbnail

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. The Lambda architecture was popular in the early days of Hadoop but seems to have fallen out of favor.

Data Lake 100