Remove 2009 Remove Analytics Application Remove Hadoop
article thumbnail

5 Apache Spark Best Practices

Data Science Blog: Data Engineering

Apache Spark began as a research project at UC Berkeley’s AMPLab, a student, researcher, and faculty collaboration centered on data-intensive application domains, in 2009. Spark outperforms Hadoop in many ways, reaching performance levels that are nearly 100 times higher in some cases.

Hadoop 52
article thumbnail

The Evolution of Table Formats

Monte Carlo

Let’s revisit how several of those key table formats have emerged and developed over time: Apache Avro : Developed as part of the Hadoop project and released in 2009, Apache Avro provides efficient data serialization with a schema-based structure.