Hands-On Introduction to Delta Lake with (py)Spark
Towards Data Science
FEBRUARY 15, 2023
The main player in the context of the first data lakes was Hadoop, a distributed file system, with MapReduce, a processing paradigm built over the idea of minimal data movement and high parallelism. The proposal is simple — “Trow everything you have here inside and worry later”. The implementation 0.
Let's personalize your content