article thumbnail

Streaming Big Data Files from Cloud Storage

Towards Data Science

This continues a series of posts on the topic of efficient ingestion of data from the cloud (e.g., Before we get started, let’s be clear…when using cloud storage, it is usually not recommended to work with files that are particularly large. here , here , and here ). CPU cores and TCP connections).

article thumbnail

Enabling Security for Hadoop Data Lake on Google Cloud Storage

Uber Engineering

Our latest blog dives into enabling security for Uber’s modernized batch data lake on Google Cloud Storage! Ready to boost your Hadoop Data Lake security on GCP?

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Enabling Multi-User Fine-Grained Access Control for Cloud Storage in CDP

Cloudera

Shared Data Experience ( SDX ) on Cloudera Data Platform ( CDP ) enables centralized data access control and audit for workloads in the Enterprise Data Cloud. The public cloud (CDP-PC) editions default to using cloud storage (S3 for AWS, ADLS-gen2 for Azure). For more details, see the following resources .

article thumbnail

Creating ArcGIS Cloud Storage (ACS) connection files for STAC

ArcGIS

This is a blog that will take users through workflows to create cloud storage connection files that can be used in their stac connections.

article thumbnail

Cloudera Operational Database (COD) Performance Benchmarking: Comparing HDFS and Cloud Storage

Cloudera

Powered by Apache HBase and Apache Phoenix, COD ships out of the box with Cloudera Data Platform (CDP) in the public cloud. It’s also multi-cloud ready to meet your business where it is today, whether AWS, Microsoft Azure, or GCP. We tested for two cloud storages, AWS S3 and Azure ABFS. runtime version.

article thumbnail

The fancy data stack—batch version

Christophe Blefari

As a disclaimer, this may not quite make sense in a corporate context, but since this is my blog, I'll do what I want. However over the years I've met people working at these companies so I might have a few biais. Still, the idea of this post is to give you an overview of existing tools and how everything fits together.

article thumbnail

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Snowflake

Rather than streaming data from source into cloud object stores then copying it to Snowflake, data is ingested directly into a Snowflake table to reduce architectural complexity and reduce end-to-end latency. And if you are using Amazon Managed Streaming for Apache Kafka (MSK), you can get started using this guided demo.

Kafka 137