Sat.Jun 23, 2018 - Fri.Jun 29, 2018

article thumbnail

Package Management And Distribution For Your Data Using Quilt with Kevin Moore - Episode 37

Data Engineering Podcast

Summary Collaboration, distribution, and installation of software projects is largely a solved problem, but the same cannot be said of data. Every data team has a bespoke means of sharing data sets, versioning them, tracking related metadata and changes, and publishing them for use in the software systems that rely on them. The CEO and founder of Quilt Data, Kevin Moore, was sufficiently frustrated by this problem to create a platform that attempts to be the means by which data can be as collabo

article thumbnail

JVM Profiler: An Open Source Tool for Tracing Distributed JVM Applications at Scale

Uber Engineering

Computing frameworks like Apache Spark have been widely adopted to build large-scale data applications. For Uber, data is at the heart of strategic decision-making and product development. To help us better leverage this data, we manage massive deployments of Spark … The post JVM Profiler: An Open Source Tool for Tracing Distributed JVM Applications at Scale appeared first on Uber Engineering Blog.

article thumbnail

Introducing Blended Learning From Cloudera University

Cloudera

Over the past decade, Cloudera University has taught more than 50,000 developers, administrators, analysts, and data scientists how to apply big data technologies. Developers are learning the APIs, so they can create new applications that were never before possible. Administrators learn to plan, install, monitor, and troubleshoot clusters. And analysts discover the power of SQL over large, diverse datasets.

Hadoop 44
article thumbnail

The State of Open Source

Zalando Engineering

The evolution and future of open source at Zalando Open source software has been the core of Zalando’s tech stack since the company’s humble beginnings, selling flip-flops from a basement 10 years ago; it’s part of our DNA as a tech company. For engineering teams at Zalando, open source is a natural part of how we solve problems, we consult and share the TechRadar for guidance on appropriate technologies to use, we contribute to projects such as Kubernetes , and work in the open on a very large

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.