article thumbnail

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

what if you have other services interacting with the data store) Are there limits in terms of the volume of data that can be managed within a single transaction? How does unifying the interface for Spark to interact with batch and streaming data sets simplify the workflow for an end user? When is Delta Lake the wrong choice? (e.g.

Data Lake 100