Accessibility, Blog and Unstructured Data

Accelerate AI Development with Snowflake

Snowflake

NOVEMBER 11, 2024

However, scaling LLM data processing to millions of records can pose data transfer and orchestration challenges, easily addressed by the user-friendly SQL functions in Snowflake Cortex. With these functions, teams can run tasks such as semantic filters and joins across unstructured data sets using familiar SQL syntax.

Unstructured Data

Unstructured Data SQL AWS Healthcare

Data Engineering Weekly #195

Data Engineering Weekly

OCTOBER 27, 2024

Astasia Myers: The three components of the unstructured data stack LLMs and vector databases significantly improved the ability to process and understand unstructured data. The blog is an excellent summary of the existing unstructured data landscape. What are you waiting for?

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Mastering Multi-Cloud with Cloudera: Strategic Data & AI Deployments Across Clouds

Cloudera

JANUARY 7, 2025

This transition streamlined data analytics workflows to accommodate significant growth in data volumes. By leveraging the Open Data Lakehouse’s ability to unify structured and unstructured data with built-in governance and security, the organization tripled its analyzed data volume within a year, boosting operational efficiency.

Cloud

Cloud Government AWS Unstructured Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data Engineering Weekly #207

Data Engineering Weekly

FEBRUARY 9, 2025

[link] QuantumBlack: Solving data quality for gen AI applications Unstructured data processing is a top priority for enterprises that want to harness the power of GenAI. It brings challenges in data processing and quality, but what data quality means in unstructured data is a top question for every organization.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Snowflake Cortex Search: State-of-the-Art Hybrid Search for RAG Applications

Snowflake

JULY 25, 2024

Snowflake Cortex Search, a fully managed search service for documents and other unstructured data, is now in public preview. Solving the challenges of building high-quality RAG applications From the beginning, Snowflake’s mission has been to empower customers to extract more value from their data.

Unstructured Data

Unstructured Data Metadata Government SQL

Fueling Enterprise Generative AI with Data: The Cornerstone of Differentiation

Cloudera

JUNE 11, 2024

By leveraging an organization’s proprietary data, GenAI models can produce highly relevant and customized outputs that align with the business’s specific needs and objectives. Structured data is highly organized and formatted in a way that makes it easily searchable in databases and data warehouses.

Unstructured Data

Unstructured Data Pharmaceutical Banking Manufacturing

Apache Ozone – A Multi-Protocol Aware Storage System

Cloudera

NOVEMBER 7, 2023

Are you struggling to manage the ever-increasing volume and variety of data in today’s constantly evolving landscape of modern data architectures? This blog post is intended to provide guidance to Ozone administrators and application developers on the optimal usage of the bucket layouts for different applications.

Systems

Systems Hadoop Unstructured Data Media

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms. In this blog, we will discuss: What is the Open Table format (OTF)? They also support ACID transactions, ensuring data integrity and stored data reliability.

Architecture

Architecture Systems Data Lake Google Cloud

Data security vs usability: you can have it all

Cloudera

OCTOBER 26, 2020

Just like when it comes to data access in business. Enabling data access for end-users so they can drive insight and business value is a typical area of compromise between IT and users. Data access can either be very secure but restrictive or very open yet risky. Quickly onboard data.

Data Security

Data Security IT Unstructured Data Government

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

Striim

APRIL 18, 2025

Why AI and Analytics Require Real-Time, High-Quality Data To extract meaningful value from AI and analytics, organizations need data that is continuously updated, accurate, and accessible. Heres why: AI Models Require Clean Data: Machine learning models are only as good as their training data.

High Quality Data

High Quality Data Business Intelligence Unstructured Data Data Pipeline

Data Engineering Weekly #177

Data Engineering Weekly

JUNE 24, 2024

A few highlights from the report Unstructured data goes mainstream. link] Sponsored: 2024 State of Apache Airflow Report Gain access to the latest trends and insights shaping the world of Apache Airflow—the go-to platform for data pipeline development and orchestration.

Data Engineering

Data Engineering Data Engineer Engineering Google Cloud

Data Engineering Weekly #181

Data Engineering Weekly

JULY 21, 2024

The blog is an excellent summary of what one needs to know about Gen-AI to start. link] Manuel Faysse: ColPali - Efficient Document Retrieval with Vision Language Models 👀 80% of enterprise data exists in difficult-to-use formats like HTML, PDF, CSV, PNG, PPTX, and more.

Data Engineering

Data Engineering Data Engineer Engineering Generalist

2024 Governance Trends for Data Leaders

phData: Data Engineering

NOVEMBER 1, 2024

In an effort to better understand where data governance is heading, we spoke with top executives from IT, healthcare, and finance to hear their thoughts on the biggest trends, key challenges, and what insights they would recommend. This blog is a collection of those insights, but for the full trendbook, we recommend downloading the PDF.

Government

Government Data Governance Finance Metadata

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Data Engineering Weekly

MARCH 5, 2025

This blog post expands on that insightful conversation, offering a critical look at Iceberg's potential and the hurdles organizations face when adopting it. Initially, catalogs focused on managing metadata for structured data in Iceberg tables.

Hadoop

Hadoop Metadata Data Ingestion Data Governance

CDP Data Visualization: Self-Service Data Visualization For The Full Data Lifecycle

Cloudera

OCTOBER 29, 2020

More importantly, from a security and governance perspective, native integration with CDP means SSO for authentication and seamless integration with Cloudera Shared Data Experience (SDX) to manage user access and governance. With DV, users login with their CDP credentials and start analyzing data that they have access to.

Machine Learning

Machine Learning Data Warehouse Unstructured Data Government

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

We scored the highest in hybrid, intercloud, and multi-cloud capabilities because we are the only vendor in the market with a true hybrid data platform that can run on any cloud including private cloud to deliver a seamless, unified experience for all data, wherever it lies.

Cloud

Cloud Unstructured Data Metadata Government

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Rather than defining schema upfront, a user can decide which data and schema they need for their use case. Snowflake has long supported semi-structured data types and file formats like JSON, XML, Parquet, and more recently storage and processing of unstructured data such as PDF documents, images, videos, and audio files.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Streaming Edge Data Collection and Global Data Distribution

Cloudera

JUNE 9, 2022

In the first blog of the Universal Data Distribution blog series , we discussed the emerging need within enterprise organizations to take control of their data flows. controlling distribution while also allowing the freedom and flexibility to deliver the data to different services is more critical than ever. .

Data Collection

Data Collection Data Lake Unstructured Data Retail

Expediting SQL Workers means Expediting your Business

Cloudera

NOVEMBER 10, 2020

Two of the more painful things in your everyday life as an analyst or SQL worker are not getting easy access to data when you need it, or not having easy to use, useful tools available to you that don’t get in your way! HUE’s table browser, with built-in data sampling. Efficient Query Design. Optimization as you go.

SQL

SQL Unstructured Data Hadoop Data Lake

Why Choose a Hybrid Data Cloud in Financial Services?

Cloudera

JANUARY 28, 2022

Then there are the more extensive discussions – scrutiny of the overarching, data strategy questions related to privacy, security, data governance /access and regulatory oversight. These are not straightforward decisions, especially when data breaches always hit the top of the news headlines.

Cloud

Cloud Banking Data Governance Government

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. PII data) of each data product, and the access rights for each different group of data consumers.

Architecture

Architecture Metadata Kafka Government

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Unstructured Data Data Architecture Government

2020 Data Impact Award Winner Spotlight: Merck KGaA

Cloudera

DECEMBER 11, 2020

As mentioned in my previous blog on the topic , the recent shift to remote working has seen an increase in conversations around how data is managed. Toolsets and strategies have had to shift to ensure controlled access to data. This is what really stood out about the finalists of the Data Security and Governance category.

Data Lake

Data Lake Government Data Security Unstructured Data

Learn How Cloudera Drives Healthcare Data Insights at HIMSS 21

Cloudera

JULY 29, 2021

Securely protecting healthcare data is critical for your organization’s success, whether data is ingested, streamed and stored in a data platform that runs in the public, private or hybrid cloud. Public, private, hybrid or on-premise data management platform. Be The Change. Security and governance in a hybrid environment.

Healthcare

Healthcare Unstructured Data Government Machine Learning

What Separates Hybrid Cloud and ‘True’ Hybrid Cloud?

Cloudera

MAY 14, 2024

This form of hybrid also goes a level deeper than one may find in a standard hybrid cloud, accounting for the entirety of the data lifecycle, whether that’s the point of ingestion, warehousing, or machine learning—even when that end-to-end data lifecycle is split between entirely different environments. Data comes in many forms.

Cloud

Cloud Data Governance Unstructured Data Data Architecture

Gen AI Perspectives from Industry Leaders Shaping the Future

Snowflake

MAY 9, 2024

It started when one capable model suited for text gained mainstream attention, and now, less than 18 months later, there is a long list of commercial and open-source gen AI models are now available, alongside new multimodal models that also understand images and other unstructured data. Ready to dive deeper into gen AI?

Unstructured Data

Unstructured Data Manufacturing Retail Data Warehouse

Maximizing Supply Chain Agility through the “Last Mile” Commitment

Cloudera

JANUARY 5, 2021

The retailer leveraged Cloudera to build an analytics solution for fulfillment delivery that allowed for advanced analytic modeling, A/B testing, and optimization by improved data access of omnichannel orders, logistics, and delivery capacity. Additional retail content can be found at our retail resource kit .

Retail

Retail Unstructured Data Big Data Machine Learning

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Cloudera

AUGUST 13, 2021

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis.

Data Pipeline

Data Pipeline Data Lake ETL Tools Unstructured Data

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

Imagine having self-service access to all business data, anywhere it may be, and being able to explore it all at once. Imagine quickly answering burning business questions nearly instantly, without waiting for data to be found, shared, and ingested. An architectural innovation: Cloudera Data Platform (CDP) and Apache Iceberg.

Architecture

Architecture Metadata Machine Learning Unstructured Data

Listening to the Customer in the 21st Century: It’s All About Data

Cloudera

OCTOBER 28, 2020

To start, they look to traditional financial services data, combining and correlating account activity, borrowing history, core banking, investments, and call center data. While Rabobank has always had access to this data, drawing meaningful insight from it was a different matter. .

Unstructured Data

Unstructured Data Banking Machine Learning Media

DELL/EMC taking the next step with PowerScale and ECS certification on CDP Private Cloud Base

Cloudera

OCTOBER 26, 2020

To learn more, check out the blog post here. . Attribute-based access control and SparkSQL fine-grained access control. Lineage and chain of custody, advanced data discovery and business glossary. Store and access schemas across clusters and rebalance clusters with Cruise Control. Ranger 2.0.

Certification

Certification Cloud Kafka Unstructured Data

Bring Gen AI & LLMs to Your Data

Snowflake

JUNE 28, 2023

To make it even easier and secure for customers to take advantage of leading LLMs, Snowpark Container Services can be used as part of a Snowflake Native App , so customers will be able to get direct access to leading LLMs via the Snowflake Marketplace and installed to run entirely in their Snowflake accounts. Read this blog.

Pipeline-centric

Pipeline-centric Unstructured Data Data Government

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

Hopefully this blog will give ChatGPT an opportunity to learn and correct itself while counting towards my 2023 contribution to social good. The one key component that is missing is a common, shared table format, that can be used by all analytic services accessing the lakehouse data.

Education

Education Unstructured Data Data Lake Data Warehouse

Optimizing EC2 costs on Databricks

Sync Computing

JANUARY 27, 2025

Data transfers between regions or zones incur additional costs that can outweigh the cost savings, not to mention the impact on performance. Provisioning EC2 instances in the same region as your data is not only important from a cost perspective, it also reduces access latency and increases transfer speed.

AWS

AWS Data Lake Machine Learning Big Data

A Flexible and Efficient Storage System for Diverse Workloads

Cloudera

SEPTEMBER 15, 2022

Structured data (such as name, date, ID, and so on) will be stored in regular SQL databases like Hive or Impala databases. There are also newer AI/ML applications that need data storage, optimized for unstructured data using developer friendly paradigms like Python Boto API. Ranger policies.

Systems

Systems Hadoop Metadata Telecommunication

How Banks are Using Technologies to Help Underserved Communities

Cloudera

FEBRUARY 7, 2023

Financial inclusion, defined as the availability and accessibility of financial services to underserved communities, is a critical issue facing the banking industry today. Access to financial services and credit can help lift individuals and entire underserved communities out of poverty. According to the World Bank, 1.7

Banking

Banking Technology Insurance Education

Snowpark Offers Expanded Capabilities Including Fully Managed Containers, Native ML APIs, New Python Versions, External Access, Enhanced DevOps and More

Snowflake

JUNE 28, 2023

By bringing compute closer to the data businesses can eliminate data silos, address security and governance challenges, and optimize operations, leading to enhanced efficiency, all while avoiding the management overhead associated with additional systems and infrastructure.

Python

Python Accessible Accessibility Pipeline-centric

Chose Both: Data Fabric and Data Lakehouse

Cloudera

SEPTEMBER 12, 2022

Organizations don’t know what they have anymore and so can’t fully capitalize on it — the majority of data generated goes unused in decision making. And second, for the data that is used, 80% is semi- or unstructured. Both obstacles can be overcome using modern data architectures, specifically data fabric and data lakehouse.

Unstructured Data

Unstructured Data Data Lake Data Architecture Data

DoorDash identifies Five big areas for using Generative AI

DoorDash Engineering

APRIL 26, 2023

The company is exploring the use of Generative AI, a subset of Artificial Intelligence that generates novel content based on existing data, and how it can be implemented effectively with consideration for the privacy and security of personal information. In fact, we used generative AI to help edit this blog post!

Food

Food Unstructured Data Deep Learning SQL

5 Generative AI Use Cases Companies Can Implement Today

Towards Data Science

OCTOBER 7, 2023

Given LLMs’ capacity to understand and extract insights from unstructured data, businesses are finding value in summarizing, analyzing, searching, and surfacing insights from large amounts of internal information. Let’s explore how a few key sectors are putting gen AI to use.

Unstructured Data

Unstructured Data Finance SQL Database

Five Strategies to Accelerate Data Product Development

Cloudera

JULY 26, 2021

In fact, data product development introduces an additional requirement that wasn’t as relevant in the past as it is today: That of scalability in permissioning and authorization given the number and multitude of different roles of data constituents, both internal and external accessing a data product.

Generalist

Generalist Telecommunication Healthcare Data Science

Ingest Data Faster, Easier and Cost-Effectively with New Connectors and Product Updates

Snowflake

JUNE 13, 2024

Like any first step, data ingestion is a critical foundational block. Given the many different ways to ingest data, in this blog we will walk through the various methods, calling out the latest announcements and improvements we’ve made. Ingestion with Snowflake should feel like a breeze.

Data Ingestion

Data Ingestion MySQL PostgreSQL Data Pipeline

Evaluating Data Observability Tools: A Comprehensive Guide

Data Engineering Weekly

SEPTEMBER 18, 2024

Decoupling of Storage and Compute : Data lakes allow observability tools to run alongside core data pipelines without competing for resources by separating storage from compute resources. This opens up new possibilities for monitoring and diagnosing data issues across various sources.

Data Lake

Data Lake Data Pipeline Unstructured Data Data

How-to: Index Data from S3 Using CDP Data Hub

Cloudera

SEPTEMBER 9, 2020

This blog post will present a simple “hello world” kind of example on how to get data that is stored in S3 indexed and served by an Apache Solr service hosted in a Data Discovery and Exploration cluster in CDP. We will only cover AWS and S3 environments in this blog. You have CLI access to that cluster.

AWS

AWS Data Unstructured Data Hadoop

Accelerate AI Development with Snowflake

Data Engineering Weekly #195

Webinars

Trending Sources

Mastering Multi-Cloud with Cloudera: Strategic Data & AI Deployments Across Clouds

Webinars

Data Engineering Weekly #207

Snowflake Cortex Search: State-of-the-Art Hybrid Search for RAG Applications

Fueling Enterprise Generative AI with Data: The Cornerstone of Differentiation

Apache Ozone – A Multi-Protocol Aware Storage System

Why Open Table Format Architecture is Essential for Modern Data Systems

Data security vs usability: you can have it all

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

Data Engineering Weekly #177

Data Engineering Weekly #181

2024 Governance Trends for Data Leaders

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

CDP Data Visualization: Self-Service Data Visualization For The Full Data Lifecycle

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Streaming Edge Data Collection and Global Data Distribution

Expediting SQL Workers means Expediting your Business

Why Choose a Hybrid Data Cloud in Financial Services?

How Cloudera Data Flow Enables Successful Data Mesh Architectures

The Future Is Hybrid Data, Embrace It

2020 Data Impact Award Winner Spotlight: Merck KGaA

Learn How Cloudera Drives Healthcare Data Insights at HIMSS 21

What Separates Hybrid Cloud and ‘True’ Hybrid Cloud?

Gen AI Perspectives from Industry Leaders Shaping the Future

Maximizing Supply Chain Agility through the “Last Mile” Commitment

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

The Modern Data Lakehouse: An Architectural Innovation

Listening to the Customer in the 21st Century: It’s All About Data

DELL/EMC taking the next step with PowerScale and ECS certification on CDP Private Cloud Base

Bring Gen AI & LLMs to Your Data

Educating ChatGPT on Data Lakehouse

Optimizing EC2 costs on Databricks

A Flexible and Efficient Storage System for Diverse Workloads

How Banks are Using Technologies to Help Underserved Communities

Snowpark Offers Expanded Capabilities Including Fully Managed Containers, Native ML APIs, New Python Versions, External Access, Enhanced DevOps and More

Chose Both: Data Fabric and Data Lakehouse

DoorDash identifies Five big areas for using Generative AI

5 Generative AI Use Cases Companies Can Implement Today

Five Strategies to Accelerate Data Product Development

Ingest Data Faster, Easier and Cost-Effectively with New Connectors and Product Updates

Evaluating Data Observability Tools: A Comprehensive Guide

How-to: Index Data from S3 Using CDP Data Hub

Stay Connected