2009, Analytics Application and Hadoop - Data Engineering Digest

5 Apache Spark Best Practices

Data Science Blog: Data Engineering

JULY 4, 2022

Apache Spark began as a research project at UC Berkeley’s AMPLab, a student, researcher, and faculty collaboration centered on data-intensive application domains, in 2009. Spark outperforms Hadoop in many ways, reaching performance levels that are nearly 100 times higher in some cases.

Hadoop

Hadoop Big Data Datasets Scala

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

Let’s revisit how several of those key table formats have emerged and developed over time: Apache Avro : Developed as part of the Hadoop project and released in 2009, Apache Avro provides efficient data serialization with a schema-based structure.

Data Lake

Data Lake Metadata Hadoop Data Governance

Data Engineering Digest

5 Apache Spark Best Practices

The Evolution of Table Formats

Webinars

Stay Connected