Sat.May 19, 2018 - Fri.May 25, 2018

article thumbnail

PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32

Data Engineering Podcast

Summary Most businesses end up with data in a myriad of places with varying levels of structure. This makes it difficult to gain insights from across departments, projects, or people. Presto is a distributed SQL engine that allows you to tie all of your information together without having to first aggregate it all into a data warehouse. Kamil Bajda-Pawlikowski co-founded Starburst Data to provide support and tooling for Presto, as well as contributing advanced features back to the project.

article thumbnail

Now Available: Cloudera Data Science Workbench Release 1.4

Cloudera

Cloudera Data Science Workbench (CDSW) makes secure, collaborative data science at scale a reality for the enterprise and accelerates the delivery of new data products. With CDSW, organizations can research and experiment faster, deploy models easily and with confidence, as well as rely on the wider Cloudera platform to reduce the risks and costs of data science projects.

article thumbnail

Building the Modern Platform with Cloudera Enterprise 6.x

Cloudera

Why Enterprises Need to Unify ML, Analytics, and Cloud. Times are changing, and the traditional models of analytics and data management don’t serve the needs of the modern enterprise, so the way to address these topics is changing too. While organizations are moving more workloads to the cloud, many mission-critical workloads remain on-prem. End users, data stewards, governance groups, and security groups alike can easily get overwhelmed with multiple access points, inconsistent user interfaces,

article thumbnail

Cloudera Altus is Now Available on Azure

Cloudera

It was exactly one year ago at Strata London that we introduced the world to Cloudera Altus Data Engineering. The premise was simple: make it quicker and easier for customers to drive data to their machine learning and analytics services by leveraging cloud resources, while at the same time, eliminating the pain associated with managing datacenter or cloud infrastructure.

AWS 40
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.