Data, Metadata and Unstructured Data - Data Engineering Digest

Agents of Change: Navigating 2025 with AI and Data Innovation

Data Engineering Weekly

DECEMBER 28, 2024

In this post, we delve into predictions for 2025, focusing on the transformative role of AI agents, workforce dynamics, and data platforms. For professionals across domains—data engineers, AI engineers, and data scientists—the message is clear: adapt or become obsolete.

Unstructured Data

Unstructured Data Metadata Government Data

Unapologetically Technical Episode 20 – Shane Murray

Jesse Anderson

MAY 5, 2025

I n this episode of Unapologetically Technical, I interview Shane Murray, Field CTO at Monte Carlo Data. Shane shares his compelling journey from studying math and finance in Sydney, Australia, to leading AI strategy at a major data observability company in New York.

Unstructured Data

Unstructured Data Finance Metadata Architecture

Your Enterprise Data Needs an Agent

Snowflake

FEBRUARY 12, 2025

Agents need to access an organization's ever-growing structured and unstructured data to be effective and reliable. As data connections expand, managing access controls and efficiently retrieving accurate informationwhile maintaining strict privacy protocolsbecomes increasingly complex. text, audio) and structured (e.g.,

Unstructured Data

Unstructured Data Government SQL Structured Data

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk

Data Engineering Podcast

JUNE 17, 2021

Summary Working with unstructured data has typically been a motivation for a data lake. Kirk Marple has spent years working with data systems and the media industry, which inspired him to build a platform for automatically organizing your unstructured assets to make them more valuable. No more scripts, just SQL.

Unstructured Data

Unstructured Data Data Warehouse Metadata Media

Build Better Data Pipelines with SQL and Python in Snowflake

Snowflake

JUNE 10, 2025

Data transformations are the engine room of modern data operations — powering innovations in AI, analytics and applications. As the core building blocks of any effective data strategy, these transformations are crucial for constructing robust and scalable data pipelines. This puts data engineers in a critical position.

Data Pipeline

Data Pipeline SQL Python Building

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake

DECEMBER 4, 2024

Together with a dozen experts and leaders at Snowflake, I have done exactly that, and today we debut the result: the “ Snowflake Data + AI Predictions 2024 ” report. When you’re running a large language model, you need observability into how the model may change as it ingests new data. The next evolution in data is making it AI ready.

Unstructured Data

Unstructured Data Data Lake Deep Learning Metadata

Scale Unstructured Text Analytics with Batch LLM Inference

Snowflake

MARCH 6, 2025

Large language models (LLMs) are transforming how we extract value from this data by running tasks from categorization to summarization and more. While AI has proved that real-time conversations in natural language are possible with LLMs, extracting insights from millions of unstructured data records using these LLMs can be a game changer.

Unstructured Data

Unstructured Data Media Medical Data Workflow

Databricks Delta Lake: A Scalable Data Lake Solution

ProjectPro

JUNE 6, 2025

Want to process peta-byte scale data with real-time streaming ingestions rates, build 10 times faster data pipelines with 99.999% reliability, witness 20 x improvement in query performance compared to traditional data lakes, enter the world of Databricks Delta Lake now. It's a sobering thought - all that data, driving no value.

Data Lake

Data Lake Data Warehouse Metadata Unstructured Data

Simplifying Multimodal Data Analysis with Snowflake Cortex AI

Snowflake

APRIL 16, 2025

Snowflake Cortex AI now features native multimodal AI capabilities, eliminating data silos and the need for separate, expensive tools. This major enhancement brings the power to analyze images and other unstructured data directly into Snowflakes query engine, using familiar SQL at scale.

Data Analysis

Data Analysis Unstructured Data Manufacturing Retail

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Cloudera

NOVEMBER 15, 2024

In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report , there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021. However, there is a fundamental challenge standing in the way of being successful: data.

Metadata

Metadata Unstructured Data Data Lake Government

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Data Engineering Weekly

MARCH 5, 2025

The modern data stack constantly evolves, with new technologies promising to solve age-old problems like scalability, cost, and data silos. It promised to address key pain points: Scaling: Handling ever-increasing data volumes. Speed: Accelerating data insights. Data Silos: Breaking down barriers between data sources.

Hadoop

Hadoop Metadata Data Ingestion Data Governance

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

JUNE 6, 2025

Do ETL and data integration activities seem complex to you? Read this blog to understand everything about AWS Glue that makes it one of the most popular data integration solutions in the industry. Did you know the global big data market will likely reach $268.4 Businesses are leveraging big data now more than ever.

AWS

AWS Scala Metadata Data Lake

7 Best Data Warehousing Tools for Efficient Data Storage Needs

ProjectPro

JUNE 6, 2025

Data is often referred to as the new oil, and just like oil requires refining to become useful fuel, data also needs a similar transformation to unlock its true value. This transformation is where data warehousing tools come into play, acting as the refining process for your data. Why Choose a Data Warehousing Tool?

Data Storage

Data Storage PostgreSQL Data Warehouse AWS

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms.

Architecture

Architecture Systems Data Lake Google Cloud

Manage Your Unstructured Data Assets Across Cloud And Hybrid Environments With Komprise

Data Engineering Podcast

FEBRUARY 27, 2022

Summary There are a wealth of options for managing structured and textual data, but unstructured binary data assets are not as well supported across the ecosystem. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform.

Unstructured Data

Unstructured Data Cloud Management Metadata

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

It’s easy these days for an organization’s data infrastructure to begin looking like a maze, with an accumulation of point solutions here and there. Snowflake is committed to doing just that by continually adding features to help our customers simplify how they architect their data infrastructure. Here’s a closer look.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Data Engineering Podcast

JUNE 19, 2022

Summary Data analysis is a valuable exercise that is often out of reach of non-technical users as a result of the complexity of data systems. Atlan is the metadata hub for your data ecosystem. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Metadata

Metadata Unstructured Data MongoDB MySQL

2024 Governance Trends for Data Leaders

phData: Data Engineering

NOVEMBER 1, 2024

In an effort to better understand where data governance is heading, we spoke with top executives from IT, healthcare, and finance to hear their thoughts on the biggest trends, key challenges, and what insights they would recommend. With that, let’s get into the governance trends for data leaders! Want to Save This Guide for Later?

Government

Government Data Governance Finance Metadata

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Data Engineering Podcast

JUNE 26, 2022

Summary The proliferation of sensors and GPS devices has dramatically increased the number of applications for spatial data, and the need for scalable geospatial analytics. Atlan is the metadata hub for your data ecosystem. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Datasets

Datasets Unstructured Data Metadata MongoDB

Directory Tables : Access Unstructured Data

Cloudyard

MARCH 30, 2023

Read Time: 2 Minute, 30 Second For instance, Consider a scenario where we have unstructured data in our cloud storage. However, Unstructured I assume : PDF,JPEG,JPG,Images or PNG files. Directory tables metadata should be refreshed automatically when underlying stage gets updated.

Unstructured Data

Unstructured Data Accessibility Accessible Cloud Storage

A 2025 Guide to Ace the Netflix Data Engineer Interview

ProjectPro

JUNE 6, 2025

Get ready for your Netflix Data Engineer interview in 2024 with this comprehensive guide. It's your go-to resource for practical tips and a curated list of frequently asked Netflix Data Engineer Interview Questions and Answers. That's where the role of Netflix Data Engineers comes in. petabytes of data. Interested?

Data Engineering

Data Engineering Data Engineer Engineering NoSQL

Unstructured Data: Examples, Tools, Techniques, and Best Practices

AltexSoft

MAY 12, 2023

In today’s data-driven world, organizations amass vast amounts of information that can unlock significant insights and inform decision-making. A staggering 80 percent of this digital treasure trove is unstructured data, which lacks a pre-defined format or organization. What is unstructured data?

Unstructured Data

Unstructured Data NoSQL Hadoop Data Lake

Democratizing Enterprise AI: Snowflake’s New AI Capabilities Accelerate Data-Driven Innovation

Snowflake

JUNE 1, 2025

Fully managed within Snowflakes secure perimeter, these capabilities enable business users and data scientists to turn structured and unstructured data into actionable insights, without complex tooling or infrastructure. Model Context Protocol (MCP) provides an open standard for connecting AI systems with data sources.

Unstructured Data

Unstructured Data Google Cloud Government AWS

Snowflake Cortex Search: State-of-the-Art Hybrid Search for RAG Applications

Snowflake

JULY 25, 2024

Snowflake Cortex Search, a fully managed search service for documents and other unstructured data, is now in public preview. Solving the challenges of building high-quality RAG applications From the beginning, Snowflake’s mission has been to empower customers to extract more value from their data.

Unstructured Data

Unstructured Data Metadata Government SQL

How to Build a Knowledge Graph for RAG Applications?

ProjectPro

JUNE 6, 2025

This approach helps bridge the gap between unstructured text generation and structured, factual data. RAG has changed how Large Language Models (LLMs) and natural language processing systems handle large-scale data to retrieve relevant information for question-answering systems.

Building

Building Unstructured Data Database Datasets

How to Build a Data Lake?

ProjectPro

JUNE 6, 2025

This guide is your roadmap to building a data lake from scratch. We'll break down the fundamentals, walk you through the architecture, and share actionable steps to set up a robust and scalable data lake. Traditional data storage systems like data warehouses were designed to handle structured and preprocessed data.

Data Lake

Data Lake Building Hadoop Raw Data

100 Data Modelling Interview Questions To Prepare For In 2025

ProjectPro

JUNE 6, 2025

Data modeling is a crucial skill for every big data professional, but it can be challenging to master. So, if you are preparing for a data modelling interview, you have landed on the right page. We have compiled the top 50 data modelling interview questions and answers from beginner to advanced levels. billion by 2028.

Data Warehouse

Data Warehouse NoSQL PostgreSQL Relational Database

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Hire And Scale Your Data Team With Intention

Data Engineering Podcast

JUNE 12, 2022

Summary Building a well rounded and effective data team is an iterative process, and the first hire can set the stage for future success or failure. Trupti Natu has been the first data hire multiple times and gone through the process of building teams across the different stages of growth.

Metadata

Metadata Unstructured Data MongoDB Business Intelligence

10 AWS Redshift Project Ideas to Build Data Pipelines

ProjectPro

JUNE 6, 2025

Today, businesses use traditional data warehouses to centralize massive amounts of raw data from business operations. Since data needs to be accessible easily, organizations use Amazon Redshift as it offers seamless integration with business intelligence tools and helps you train and deploy machine learning models using SQL commands.

Data Pipeline

Data Pipeline AWS Project Building

How to Transition from ETL Developer to Data Engineer?

ProjectPro

JUNE 6, 2025

In the thought process of making a career transition from ETL developer to data engineer job roles? Read this blog to know how various data-specific roles, such as data engineer, data scientist, etc., differ from ETL developer and the additional skills you need to transition from ETL developer to data engineer job roles.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

Data Engineering Weekly #203

Data Engineering Weekly

JANUARY 12, 2025

With Astro, you can build, run, and observe your data pipelines in one place, ensuring your mission critical data is delivered on time. link] Sponsored: Apache Airflow® Best Practices: Running Airflow at Scale The scalability of Airflow is why data teams at companies like Uber, Ford, and LinkedIn choose it to power their data ops.

Pipeline-centric

Pipeline-centric Data Engineering Data Engineer Engineering

Data Engineering Weekly #177

Data Engineering Weekly

JUNE 24, 2024

Experience Enterprise-Grade Apache Airflow Astro augments Airflow with enterprise-grade features to enhance productivity, meet scalability and availability demands across your data pipelines, and more. A few highlights from the report Unstructured data goes mainstream. AI-driven code development is going mainstream now.

Data Engineering

Data Engineering Data Engineer Engineering Google Cloud

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Introduction to the Data Mesh Architecture and its Required Capabilities. Components of a Data Mesh.

Architecture

Architecture Metadata Kafka Government

How to Use Pinecone Vector Database in your AI Projects?

ProjectPro

JUNE 6, 2025

Traditional databases are great at handling structured data, like text or numerical values, but they struggle with high-dimensional vector data. It simplifies the process of managing vector data, removing one of the key barriers for AI-powered systems: the need for quick, scalable, and accurate search capabilities.

Database

Database Project Metadata Unstructured Data

Azure Data Factory vs AWS Glue-The Cloud ETL Battle

ProjectPro

JUNE 6, 2025

A survey by Data Warehousing Institute TDWI found that AWS Glue and Azure Data Factory are the most popular cloud ETL tools with 69% and 67% of the survey respondents mentioning that they have been using them. Azure Data Factory and AWS Glue are powerful tools for data engineers who want to perform ETL on Big Data in the Cloud.

AWS

AWS Cloud Amazon Web Services ETL Tools

100+ Big Data Interview Questions and Answers 2025

ProjectPro

JUNE 6, 2025

If you're looking to break into the exciting field of big data or advance your big data career, being well-prepared for big data interview questions is essential. Get ready to expand your knowledge and take your big data career to the next level! “Data analytics is the future, and the future is NOW!

Big Data

Big Data Hadoop Relational Database AWS

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

Data Engineering Podcast

JULY 31, 2022

Summary Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model, or data warehouse table that you are working with. Atlan is the metadata hub for your data ecosystem.

IT

IT Metadata MongoDB MySQL

Introducing Vector Search on Rockset: How to run semantic search with OpenAI and Rockset

Rockset

APRIL 18, 2023

Organizations have continued to accumulate large quantities of unstructured data, ranging from text documents to multimedia content to machine and sensor data. Comprehending and understanding how to leverage unstructured data has remained challenging and costly, requiring technical depth and domain expertise.

Unstructured Data

Unstructured Data Metadata Machine Learning SQL

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Unstructured Data Data Architecture Government

Data Observability for Analytics and ML teams

Towards Data Science

APRIL 6, 2023

Principles, practices, and examples for ensuring high quality data flows Source: DreamStudio (generated by author) Nearly 100% of companies today rely on data to power business opportunities and 76% use data as an integral part of forming a business strategy. Data quality is critical to delivering good customer experiences.

Unstructured Data

Unstructured Data Metadata Data Coding

How DataOS Nails Gartner’s Magic Quadrant for Data Integration

The Modern Data Company

JANUARY 22, 2024

The Modern Data Company has been given an honorable mention in Gartner’s 2023 Magic Quadrant for Data Integration. In response, The Modern Data Company emerged, driven by a clear mission: to revolutionize data management and address challenges posed by a diverse and rapidly evolving data environment.

Data Integration

Data Integration Metadata Government Unstructured Data

What is Apache Iceberg: Features, Architecture & Use Cases

ProjectPro

JUNE 6, 2025

Explore what is Apache Iceberg, what makes it different, and why it’s quickly becoming the new standard for data lake analytics. Data lakes were born from a vision to democratize data, enabling more people, tools, and applications to access a wider range of data. Metadata Layer 3. Workarounds became the norm.

Architecture

Architecture Data Lake Metadata Cloud Storage

Data Preparation for Machine Learning Projects: Know It All Here

ProjectPro

JUNE 6, 2025

Data preparation for machine learning algorithms is usually the first step in any data science project. It involves various steps like data collection, data quality check, data exploration, data merging, etc. This blog covers all the steps to master data preparation with machine learning datasets.

Data Preparation

Data Preparation Machine Learning Project IT

Agents of Change: Navigating 2025 with AI and Data Innovation

Unapologetically Technical Episode 20 – Shane Murray

Webinars

Trending Sources

Your Enterprise Data Needs an Agent

Webinars

Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk

Build Better Data Pipelines with SQL and Python in Snowflake

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Scale Unstructured Text Analytics with Batch LLM Inference

Databricks Delta Lake: A Scalable Data Lake Solution

Simplifying Multimodal Data Analysis with Snowflake Cortex AI

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

7 Best Data Warehousing Tools for Efficient Data Storage Needs

Why Open Table Format Architecture is Essential for Modern Data Systems

Manage Your Unstructured Data Assets Across Cloud And Hybrid Environments With Komprise

Simplifying Data Architecture and Security to Accelerate Value

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

2024 Governance Trends for Data Leaders

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Directory Tables : Access Unstructured Data

A 2025 Guide to Ace the Netflix Data Engineer Interview

Unstructured Data: Examples, Tools, Techniques, and Best Practices

Democratizing Enterprise AI: Snowflake’s New AI Capabilities Accelerate Data-Driven Innovation

Snowflake Cortex Search: State-of-the-Art Hybrid Search for RAG Applications

How to Build a Knowledge Graph for RAG Applications?

How to Build a Data Lake?

100 Data Modelling Interview Questions To Prepare For In 2025

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Hire And Scale Your Data Team With Intention

10 AWS Redshift Project Ideas to Build Data Pipelines

How to Transition from ETL Developer to Data Engineer?

Data Engineering Weekly #203

Data Engineering Weekly #177

How Cloudera Data Flow Enables Successful Data Mesh Architectures

How to Use Pinecone Vector Database in your AI Projects?

Azure Data Factory vs AWS Glue-The Cloud ETL Battle

100+ Big Data Interview Questions and Answers 2025

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

Introducing Vector Search on Rockset: How to run semantic search with OpenAI and Rockset

The Future Is Hybrid Data, Embrace It

Data Observability for Analytics and ML teams

How DataOS Nails Gartner’s Magic Quadrant for Data Integration

What is Apache Iceberg: Features, Architecture & Use Cases

Data Preparation for Machine Learning Projects: Know It All Here

Stay Connected