AWS and SQL - Data Engineering Digest

DuckDB … reading from s3 … with AWS Credentials and more.

Confessions of a Data Guy

NOVEMBER 18, 2024

What was not clear, or easy, was trying to figure out how DuckDB would LIKE to read default AWS […] The post DuckDB … reading from s3 … with AWS Credentials and more. appeared first on Confessions of a Data Guy.

AWS

AWS Data Big Data SQL

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

There is an increasing number of cloud providers offering the ability to rent virtual machines, the largest being AWS, GCP, and Azure. How the product works: they currently monitor four cloud providers (AWS, GCP, Hetzner Cloud, Azure.) We envision building something comparable to AWS Fargate , or Google Cloud Run.

Cloud

Cloud AWS Metadata Cloud Computing

A Breakthrough AI-Powered SQL Assistant

Snowflake

APRIL 11, 2024

Data is the lifeblood of modern businesses, but unlocking its true insights often requires complex SQL queries. We’re thrilled to announce the public preview of Snowflake Copilot, a new solution on the bleeding edge of text-to-SQL that simplifies data analysis while maintaining robust governance.

SQL

SQL AWS Data Analysis High Quality Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. There are numerous stream processing engines, near-real-time database engines, streaming SQL systems, etc. Can you describe what RisingWave is and the story behind it?

SQL

SQL Data Lake High Quality Data Machine Learning

Announcing the General Availability of Databricks SQL Serverless !

databricks

MAY 18, 2023

Today, we are thrilled to announce that serverless compute for Databricks SQL is Generally Available on AWS and Azure! Databricks SQL (DB SQL).

SQL

SQL AWS

Accelerate AI Development with Snowflake

Snowflake

NOVEMBER 11, 2024

However, scaling LLM data processing to millions of records can pose data transfer and orchestration challenges, easily addressed by the user-friendly SQL functions in Snowflake Cortex. Traditionally, SQL has been limited to structured data neatly organized in tables.

Unstructured Data

Unstructured Data SQL AWS Healthcare

Databricks SQL Serverless is now available on Google Cloud Platform

databricks

AUGUST 13, 2024

Databricks SQL Serverless is now Generally Available on Google Cloud Platform (GCP)! SQL Serverless is available in 7 GCP regions and 40+ regions across AWS, Azure and GCP.

Google Cloud

Google Cloud SQL Cloud AWS

Announcing the General Availability of Materialized Views and Streaming Tables for Databricks SQL

databricks

NOVEMBER 5, 2024

We’re excited to announce that materialized views (MVs) and streaming tables (STs) are now Generally Available in Databricks SQL on AWS and Azure.

SQL

SQL AWS Engineering

How to Speed up Local Development of a Docker Application running on AWS

DoorDash Engineering

MARCH 7, 2023

As backend developers, we needed to stay unblocked while the infrastructure — in this case AWS resources — was being created. It was fair to assume that we would use other AWS services, particularly SQS and AWS Secrets Manager. Use LocalStack to enable locally running AWS resources.

AWS

AWS PostgreSQL Database SQL

Databricks SQL Statement Execution API – Announcing the Public Preview

databricks

MARCH 6, 2023

Today, we are excited to announce the public preview of the Databricks SQL Statement Execution API, available on AWS and Azure. You can.

SQL

SQL AWS

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Kafka

Kafka Data Lake High Quality Data SQL

How to get started with dbt

Christophe Blefari

MARCH 1, 2023

dbt Core is an open-source framework that helps you organise data warehouse SQL transformation. AWS, GCP, Azure—the storage price dropped and we became data insatiable, we were in need of all the company data, in one place, in order to join and compare everything. When I write dbt, I often mean dbt Core. Enter the ELT.

Data Warehouse

Data Warehouse SQL Metadata Raw Data

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Unistore is made possible by Hybrid Tables (now generally available on AWS commercial regions with a few exceptions ), which enables fast, single-row reads and writes in order to support transactional workloads. Sensitive data can have enormous value but is oftentimes locked down due to privacy requirements.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Introducing Materialized Views and Streaming Tables for Databricks SQL

databricks

JUNE 27, 2023

We are thrilled to announce that materialized views and streaming tables are now publicly available in Databricks SQL on AWS and Azure. Streaming.

SQL

SQL AWS

Is Aws Certification Worth It?

Knowledge Hut

NOVEMBER 16, 2023

There is a clear shortage of professionals certified with Amazon Web Services (AWS). As far as AWS certifications are concerned, there is always a certain debate surrounding them. AWS certification helps you reach new heights in your career with improved pay and job opportunities. What is AWS?

AWS

AWS Certification IT Amazon Web Services

Announcing the General Availability of the Databricks SQL Statement Execution API

databricks

OCTOBER 13, 2023

Today, we are excited to announce the general availability of the Databricks SQL Statement Execution API on AWS and Azure, with support for.

SQL

SQL AWS

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. Your first 30 days are free! Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

FEBRUARY 8, 2023

AWS Glue is here to put an end to all your worries! Read this blog to understand everything about AWS Glue that makes it one of the most popular data integration solutions in the industry. Well, AWS Glue is the answer to your problems! In 2023, more than 5140 businesses worldwide have started using AWS Glue as a big data tool.

AWS

AWS Scala Metadata Data Lake

Scalable Model Development and Production in Snowflake ML

Snowflake

MARCH 31, 2025

Inference: Model Serving in Snowpark Container Services, now generally available in both AWS and Azure, offers easy and performant distributed inference with CPUs or GPUs for any model, regardless of where it was trained. Snowflake ML now also supports the ability to generate and use synthetic data, now in public preview.

Healthcare

Healthcare Medical Government Food

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Towards Data Science

APRIL 6, 2023

But, instead of GCP, we’ll be using AWS. AWS is, by far, the most popular cloud computing platform, it has an absurd number of products to solve every type of specific problem you imagine. So, join me on this post to develop a full data pipeline from scratch using some pieces from the AWS toolset. S3 is AWS’ blob storage.

AWS

AWS Data Pipeline Amazon Web Services Python

Your Enterprise Data Needs an Agent

Snowflake

FEBRUARY 12, 2025

Agents use Cortex Analyst (structured SQL) and Cortex Search (unstructured data) as tools, along with LLMs, to analyze and generate answers. Route across tools: The agent selects a tool Cortex Analyst, Cortex Search or SQL generation from natural language to facilitate governed access and enable compliance with enterprise policies.

Unstructured Data

Unstructured Data Government SQL Structured Data

Boto3 vs AWS Wrangler: Simplifying S3 Operations with Python

Towards Data Science

JUNE 19, 2023

A comparative analysis for AWS S3 development Continue reading on Towards Data Science »

AWS

AWS Python Data Science Cloud Storage

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can! Rudderstack : ![Rudderstack]([link]

Architecture

Architecture Data Lake High Quality Data SQL

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

In the data world Snowflake and Databricks are our dedicated platforms, we consider them big, but when we take the whole tech ecosystem they are (so) small: AWS revenue is $80b, Azure is $62b and GCP is $37b. A UX where you buy a single tool combining engine and storage, where all you have to do is flow data in, write SQL, and it's done.

Metadata

Metadata Data Warehouse BI MySQL

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Data Engineering Podcast

SEPTEMBER 7, 2020

Your host is Tobias Macey and today I’m interviewing Martin Traverso about PrestoSQL, a distributed SQL engine that queries data in place Interview Introduction How did you get involved in the area of data management? Can you start by giving an overview of what Presto is and its origin story?

Architecture

Architecture Data Architecture SQL Engineering

Data Engineering Weekly #217

Data Engineering Weekly

APRIL 20, 2025

With AWS rapidly slicing the cost of S3 Express, the blog makes a solid argument that disk-based Kafka is 3.7X The popularity also exposes its Achilles heel, the replication and network bottlenecks. expensive than diskless Kafka out of S3 Express One. Apache Hudi, for example, introduces an indexing technique to Lakehouse.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Data Engineering Weekly #216

Data Engineering Weekly

APRIL 13, 2025

link] Wealthfront: Our Journey to Building a Scalable SQL Testing Library for Athena Wealthfront introduces an in-house SQL testing library tailored for AWS Athena, emphasizing principles of zero-footprint testing via CTEs, usability through Python integration and existing Avro schemas, dynamic test execution, and clear test feedback.

Data Engineering

Data Engineering Data Engineer Engineering Datasets

AWS DMS CDC SQL Server: Configure, Consider, Limitations, Alternatives

Hevo

AUGUST 6, 2024

This is where AWS Database Migration Service (DMS) and […] You’re trying to keep everything in sync, but manual updates and batch processing don’t cut it anymore. You need a reliable way to keep your data up-to-date across all platforms.

AWS

AWS SQL Database Systems

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

link] JBarti: Write Manageable Queries With The BigQuery Pipe Syntax Our quest to simplify SQL is always an adventure. The blog narrates a few examples of Pipe Syntax in comparison with the SQL queries. BigQuery's pipe syntax seems exciting to watch, and it is an interesting approach to how it gets adopted.

Data Engineering

Data Engineering Data Engineer Engineering Insurance

12 Golden Signals To Discover Anomalies And Performance Issues on Your AWS RDS Fleet

Zalando Engineering

FEBRUARY 19, 2024

The Zalando TechRadar guides teams about the database selection and their deployment options – AWS RDS with Postgres as one of the available options. Complex anomaly detection tasks, such as byzantine failures or issues with SQL statements, takes a noticeable investment all over the place. Hidden costs by toil.

AWS

AWS Utilities Database SQL

Snowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open

Snowflake

APRIL 24, 2024

Efficiently Intelligent: Arctic excels at enterprise tasks such as SQL generation, coding and instruction following benchmarks even when compared to open source models trained with significantly higher compute budgets. Enterprises want to use LLMs to build conversational SQL data copilots, code copilots and RAG chatbots.

Amazon Web Services

Amazon Web Services SQL AWS Architecture

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake Building High Quality Data AWS

Build Modern Innovative Solutions on Cloudera Data Platform Using the Power of Generative AI with Amazon Bedrock

Cloudera

OCTOBER 31, 2023

Cloudera recently signed a strategic collaboration agreement with Amazon Web Services (AWS), reinforcing our relationship and commitment to accelerating and scaling cloud native data management and data analytics on AWS. Let us dive into what is happening in each of these pillars between AWS and Cloudera.

Building

Building Amazon Web Services AWS Machine Learning

Announcing new security controls and compliance certifications for Azure Databricks and AWS Databricks SQL Serverless

databricks

AUGUST 2, 2023

We're excited to share a new set of security controls and compliance certifications that can help with regulatory compliance on Azure Databricks and.

Certification

Certification AWS SQL

Snowpark Magic: Auto-Validate Your S3 to Snowflake Data Loads

Cloudyard

APRIL 22, 2025

Read Time: 2 Minute, 34 Second Introduction In modern data pipelines, especially in cloud data platforms like Snowflake, data ingestion from external systems such as AWS S3 is common. Snowpark and SQL-based solution. Why is This Framework Important? Manual validations across hundreds of files and tables? Error-prone!

Data Validation

Data Validation Data Ingestion Data Pipeline AWS

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Snowflake

JUNE 6, 2024

Snowflake Notebooks aim to provide a convenient, easy-to-use interactive environment that seamlessly blends Python, SQL and Markdown, as well as integrations with key Snowflake offerings, like Snowpark ML, Streamlit, Cortex and Iceberg tables. Discover valuable business insights through exploratory data analysis.

SQL

SQL Python Machine Learning Data Workflow

Adopting Spark Connect

Towards Data Science

NOVEMBER 6, 2024

Spark has long allowed to run SQL queries on a remote Thrift JDBC server. The appropriate Spark dependencies (spark-core/spark-sql or spark-connect-client-jvm) will be provided later in the Java classpath, depending on the run mode. hadoop-aws since we almost always have interaction with S3 storage on the client side).

Scala

Scala Java AWS Coding

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Cloudera

JANUARY 19, 2024

As described in our recent blog post , an SQL AI Assistant has been integrated into Hue with the capability to leverage the power of large language models (LLMs) for a number of SQL tasks. This is a real game-changer for data analysts on all levels and will make SQL development faster, easier, and less error-prone.

SQL

SQL Data Warehouse AWS Accessible

Part 1: A Survey of Analytics Engineering Work at Netflix

Netflix Tech

DECEMBER 17, 2024

To add this metric to DJ, they need to provide two pieces of information: The fact table that the metric comesfrom: SELECT account_id, country_iso_code, streaming_hours FROM streaming_fact_table The metric expression: `SUM(streaming_hours)` Then metric consumers throughout the organization can call DJ to request either the SQL or the resulting data.

Engineering

Engineering Entertainment Amazon Web Services Utilities

Optimizing EC2 costs on Databricks

Sync Computing

JANUARY 27, 2025

Databricks clusters and AWS EC2 In todays landscape, big data, which is data too large to fit into a single node machine, is transformed and managed by clusters. Clusters in Databricks Databricks offers Job clusters for data pipeline processing and warehouse clusters used for the SQL lakehouse. But what are clusters?

AWS

AWS Data Lake Big Data Machine Learning

What is AWS Kinesis (Amazon Kinesis Data Streams)?

Edureka

AUGUST 23, 2024

The AWS training will prepare you to become a master of the cloud, storing, processing, and developing applications for the cloud data. Amazon AWS Kinesis makes it possible to process and analyze data from multiple sources in real-time. It shows how AWS Kinesis can be effectively used for processing the streaming data.

AWS

AWS Kafka Amazon Web Services Medical

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Cloudera

OCTOBER 12, 2020

Eventador simplifies the process by allowing users to use SQL to query streams of real-time data without implementing complex code. We recently delivered all three of these streaming capabilities as cloud services through Cloudera Data Platform (CDP) Data Hub on AWS and Azure.

Cloud

Cloud Process Scala Kafka

Fast, Easy and Secure LLM App Development With Snowflake Cortex

Snowflake

NOVEMBER 1, 2023

Snowpark Container Services: This additional Snowpark runtime (available in public preview soon on select AWS regions) enables developers to effortlessly deploy, manage and scale custom containerized workloads and models for tasks such as fine-tuning open-source LLMs using secure Snowflake-managed infrastructure with GPU instances.

Government

Government SQL AWS Database

DuckDB … reading from s3 … with AWS Credentials and more.

Interesting startup idea: benchmarking cloud platform pricing

Webinars

Trending Sources

A Breakthrough AI-Powered SQL Assistant

Webinars

Top 6 Amazon Athena Interview Questions

Tackling Real Time Streaming Data With SQL Using RisingWave

Announcing the General Availability of Databricks SQL Serverless !

Accelerate AI Development with Snowflake

Databricks SQL Serverless is now available on Google Cloud Platform

Announcing the General Availability of Materialized Views and Streaming Tables for Databricks SQL

How to Speed up Local Development of a Docker Application running on AWS

Databricks SQL Statement Execution API – Announcing the Public Preview

Troubleshooting Kafka In Production

How to get started with dbt

Simplifying Data Architecture and Security to Accelerate Value

Introducing Materialized Views and Streaming Tables for Databricks SQL

Is Aws Certification Worth It?

Announcing the General Availability of the Databricks SQL Statement Execution API

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

Scalable Model Development and Production in Snowflake ML

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Your Enterprise Data Needs an Agent

Boto3 vs AWS Wrangler: Simplifying S3 Operations with Python

Addressing The Challenges Of Component Integration In Data Platform Architectures

Databricks, Snowflake and the future

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Data Engineering Weekly #217

Data Engineering Weekly #216

AWS DMS CDC SQL Server: Configure, Consider, Limitations, Alternatives

Data Engineering Weekly #198

12 Golden Signals To Discover Anomalies And Performance Issues on Your AWS RDS Fleet

Snowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open

Build A Data Lake For Your Security Logs With Scanner

Build Modern Innovative Solutions on Cloudera Data Platform Using the Power of Generative AI with Amazon Bedrock

Announcing new security controls and compliance certifications for Azure Databricks and AWS Databricks SQL Serverless

Snowpark Magic: Auto-Validate Your S3 to Snowflake Data Loads

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Adopting Spark Connect

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Part 1: A Survey of Analytics Engineering Work at Netflix

Optimizing EC2 costs on Databricks

What is AWS Kinesis (Amazon Kinesis Data Streams)?

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Fast, Easy and Secure LLM App Development With Snowflake Cortex

Stay Connected