Remove AWS Remove Hadoop Remove SQL
article thumbnail

How to get started with dbt

Christophe Blefari

dbt Core is an open-source framework that helps you organise data warehouse SQL transformation. dbt was born out of the analysis that more and more companies were switching from on-premise Hadoop data infrastructure to cloud data warehouses. tests — a way to define SQL tests either at column-level, either with a query.

article thumbnail

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Cloudera

Prior the introduction of CDP Public Cloud, many organizations that wanted to leverage CDH, HDP or any other on-prem Hadoop runtime in the public cloud had to deploy the platform in a lift-and-shift fashion, commonly known as “Hadoop-on-IaaS” or simply the IaaS model. SQL-driven Streaming App Development. Introduction.

Hadoop 86
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Adopting Spark Connect

Towards Data Science

Spark has long allowed to run SQL queries on a remote Thrift JDBC server. The appropriate Spark dependencies (spark-core/spark-sql or spark-connect-client-jvm) will be provided later in the Java classpath, depending on the run mode. hadoop-aws since we almost always have interaction with S3 storage on the client side).

Scala 75
article thumbnail

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Data Engineering Podcast

Your host is Tobias Macey and today I’m interviewing Martin Traverso about PrestoSQL, a distributed SQL engine that queries data in place Interview Introduction How did you get involved in the area of data management? Can you start by giving an overview of what Presto is and its origin story?

article thumbnail

Databricks, Snowflake and the future

Christophe Blefari

In the data world Snowflake and Databricks are our dedicated platforms, we consider them big, but when we take the whole tech ecosystem they are (so) small: AWS revenue is $80b, Azure is $62b and GCP is $37b. A UX where you buy a single tool combining engine and storage, where all you have to do is flow data in, write SQL, and it's done.

Metadata 147
article thumbnail

TimescaleDB: The Timeseries Database Built For SQL And Scale - Episode 65

Data Engineering Podcast

Links TimescaleDB Original Appearance on the Data Engineering Podcast 1.0 Links TimescaleDB Original Appearance on the Data Engineering Podcast 1.0

Database 100
article thumbnail

5 Advantages of Real-Time ETL for Snowflake

Striim

Striim offers an out-of-the-box adapter for Snowflake to stream real-time data from enterprise databases (using low-impact change data capture ), log files from security devices and other systems, IoT sensors and devices, messaging systems, and Hadoop solutions, and provide in-flight transformation capabilities.