Data Management, Data Workflow and Metadata

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

In this episode Crux CTO Mark Etherington discusses the different costs involved in managing external data, how to think about the total return on investment for your data, and how the Crux platform is architected to reduce the toil involved in managing third party data.

Data Management

Data Management Management Metadata MongoDB

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Podcast

OCTOBER 15, 2021

Summary The binding element of all data work is the metadata graph that is generated by all of the workflows that produce the assets used by teams across the organization. The DataHub project was created as a way to bring order to the scale of LinkedIn’s data needs. How is the governance of DataHub being managed?

Metadata

Metadata BI Data Warehouse SQL

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Can you describe what role Trino and Iceberg play in Stripe's data architecture?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. It aims to streamline and automate data workflows, enhance collaboration and improve the agility of data teams. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products. What is the overlap between knowledge graphs and "linked data products"?

Building

Building SQL BI Python

Metadata: What Is It and Why it Matters

Ascend.io

JULY 11, 2024

Metadata is the information that provides context and meaning to data, ensuring it’s easily discoverable, organized, and actionable. It enhances data quality, governance, and automation, transforming raw data into valuable insights. This is what managing data without metadata feels like.

Metadata

Metadata IT Government High Quality Data

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. It aims to streamline and automate data workflows, enhance collaboration and improve the agility of data teams. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your data workflows. Atlan is the metadata hub for your data ecosystem. Missing data? Stale dashboards?

Metadata

Metadata Business Intelligence Data Lake BI

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Data Engineering Podcast

AUGUST 28, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Data Engineering

Data Engineering Data Engineer MongoDB Metadata

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Data lakes are notoriously complex. Data lakes are notoriously complex.

Architecture

Architecture Data Lake High Quality Data SQL

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

Data Engineering Podcast

JULY 31, 2022

In this episode Ernie Ostic shares the approach that he and his team at Manta are taking to build a complete view of data lineage across the various data systems in your organization and the useful applications of that information in the work of every data stakeholder. Atlan is the metadata hub for your data ecosystem.

IT

IT Metadata MongoDB MySQL

Building A Shared Understanding Of Data Assets In A Business Through A Single Pane Of Glass With Workstream

Data Engineering Podcast

SEPTEMBER 18, 2022

In this episode he discusses the challenge of maintaining shared visibility and understanding of data work across the various stakeholders and his efforts to make it a seamless experience. Atlan is the metadata hub for your data ecosystem. And don’t forget to thank them for their continued support of this show!

Building

Building Metadata MongoDB MySQL

Solving Data Discovery At Lyft

Data Engineering Podcast

AUGUST 5, 2019

Announcements Welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

PostgreSQL

PostgreSQL MongoDB Metadata Data

Understanding The Immune System With Data At ImmunAI

Data Engineering Podcast

FEBRUARY 20, 2022

Summary The life sciences as an industry has seen incredible growth in scale and sophistication, along with the advances in data technology that make it possible to analyze massive amounts of genomic information. You can observe your pipelines with built in metadata search and column level lineage.

Systems

Systems Software Engineer Software Engineering Data Warehouse

Effective Pandas Patterns For Data Engineering

Data Engineering Podcast

JANUARY 30, 2022

He recently wrote a book on effective patterns for Pandas code, and in this episode he shares advice on how to write efficient data processing routines that will scale with your data volumes, while being understandable and maintainable. You can observe your pipelines with built in metadata search and column level lineage.

Data Engineering

Data Engineering Data Engineer Engineering Python

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

APRIL 5, 2021

She explains how the design of the platform is informed by the needs of managing data projects for large and small teams across her previous roles, how it integrates with your existing systems, and how it can work to bring everyone onto the same page. What portions of the data workflow is Atlan responsible for?

Data Warehouse

Data Warehouse Data Pipeline BI Metadata

The Grand Vision And Present Reality of DataOps

Data Engineering Podcast

MAY 3, 2021

Summary The Data industry is changing rapidly, and one of the most active areas of growth is automation of data workflows. Taking cues from the DevOps movement of the past decade data professionals are orienting around the concept of DataOps. How does this differ from "business as usual" in the data industry?

Data Warehouse

Data Warehouse Data Pipeline BI Metadata

Three Takeaways from Gartner’s 2019 Magic Quadrant for Data Management Solutions for Analytics

Cloudera

FEBRUARY 11, 2019

The January 2019 “Magic Quadrant for Data Management Solutions for Analytics” provides valuable insights into the status, direction, and players in the DMSA market. All this while the platform serves as the core foundation providing metadata and governance capabilities across these workloads.

Data Management

Data Management Management Metadata Government

Data Catalog - A Broken Promise

Data Engineering Weekly

DECEMBER 29, 2022

Data Catalog as a passive web portal to display metadata requires significant rethinking to adopt modern data workflow, not just adding “modern” in its prefix. I know that is an expensive statement to make😊 To be fair, I’m a big fan of data catalogs, or metadata management , to be precise.

Metadata

Metadata Data Warehouse ETL Tools Data Workflow

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

A HDFS Master Node, called a NameNode , keeps metadata with critical information about system files (like their names, locations, number of data blocks in the file, etc.) and keeps track of storage capacity, a volume of data being transferred, etc. Data management and monitoring options. Apache Hadoop ecosystem.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

RandomTrees

SEPTEMBER 17, 2024

In the realm of big data and AI, managing and securing data assets efficiently is crucial. Databricks addresses this challenge with Unity Catalog, a comprehensive governance solution designed to streamline and secure data management across Databricks workspaces. Advantages of the Unity Catalog 1.

Data Governance

Data Governance Government Metadata Machine Learning

Data Engineering Weekly #114

Data Engineering Weekly

JANUARY 15, 2023

Data Engineering Weekly Is Brought to You by RudderStack RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. 2023 predictions from the panel are; Unified metadata becomes kingmaker.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

At its core, a table format is a sophisticated metadata layer that defines, organizes, and interprets multiple underlying data files. Table formats incorporate aspects like columns, rows, data types, and relationships, but can also include information about the structure of the data itself.

Data Lake

Data Lake Metadata Hadoop Data Governance

Data Engineering Weekly #105

Data Engineering Weekly

OCTOBER 30, 2022

Editor’s Note: The current state of the Data Catalog The results are out for our poll on the current state of the Data Catalogs. The highlights are that 59% of folks think data catalogs are sometimes helpful. We saw in the Data Catalog poll how far it has to go to be helpful and active within a data workflow.

Data Engineering

Data Engineering Data Engineer Engineering Data Ingestion

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Databand.ai

AUGUST 30, 2023

DataOps , short for data operations, is an emerging discipline that focuses on improving the collaboration, integration, and automation of data processes across an organization. These tools help organizations implement DataOps practices by providing a unified platform for data teams to collaborate, share, and manage their data assets.

Data Cleanse

Data Cleanse Data Pipeline Data Ingestion Data Validation

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

DataOps is a collaborative approach to data management that combines the agility of DevOps with the power of data analytics. It aims to streamline data ingestion, processing, and analytics by automating and integrating various data workflows.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

Data Orchestration: Defining, Understanding, and Applying

Ascend.io

DECEMBER 11, 2023

Data orchestration is the process of efficiently coordinating the movement and processing of data across multiple, disparate systems and services within a company. So, why is data orchestration a big deal? It automates and optimizes data processes, reducing manual effort and the likelihood of errors.

Data Workflow

Data Workflow Data Pipeline Data Lake Data

Unified DataOps: Components, Challenges, and How to Get Started

Databand.ai

AUGUST 30, 2023

Integrating these principles with data operation-specific requirements creates a more agile atmosphere that supports faster development cycles while maintaining high quality standards. Technical Challenges Choosing appropriate tools and technologies is critical for streamlining data workflows across the organization.

Data Governance

Data Governance Data Cleanse Government Data Science

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

AltexSoft

MARCH 14, 2023

This development has paved the way for a suite of cloud-native data tools that are user-friendly, scalable, and affordable. Known as the Modern Data Stack (MDS) , this suite of tools and technologies has transformed how businesses approach data management and analysis.

IT

IT Data Warehouse Data Governance Data Lake

The Top Data Strategy Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 29, 2022

He believes in making data do the work through proper data management based on strategic rationale and business alignment. On LinkedIn, he posts frequently about big data, master data, data science, data management, and data storytelling.

BI

BI Consulting Data Science Data Governance

Azure Data Engineer (DP-203) Certification Cost in 2023

Knowledge Hut

SEPTEMBER 29, 2023

Why Should You Get an Azure Data Engineer Certification? Becoming an Azure data engineer allows you to seamlessly blend the roles of a data analyst and a data scientist. One of the pivotal responsibilities is managing data workflows and pipelines, a core aspect of a data engineer's role.

Certification

Certification Data Engineering Data Engineer Engineering

The Good and the Bad of the Elasticsearch Search and Analytics Engine

AltexSoft

SEPTEMBER 21, 2023

The Elastic Stacks Elasticsearch is integral within analytics stacks, collaborating seamlessly with other tools developed by Elastic to manage the entire data workflow — from ingestion to visualization. Each document has unique metadata fields like index , type , and id that help identify its storage location and nature.

Engineering

Engineering NoSQL Programming Language Java

The Future of Data Engineering: DEW's 2025 Predictions

Data Engineering Weekly

DECEMBER 18, 2024

This democratization, facilitated by powerful and intuitive IDEs, will empower "Citizen Data Engineers"—individuals with domain expertise who may not be traditional programmers but can now build and manage data workflows. In 2025 , Prompt Wrangling will become the most important skill for data engineers.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Data Engineering Digest

Making The Total Cost Of Ownership For External Data Manageable With Crux

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Trending Sources

Being Data Driven At Stripe With Trino And Iceberg

How To Prepare Your Data Team for 2025

Building Linked Data Products With JSON-LD

Metadata: What Is It and Why it Matters

6 Ways To Prepare Your Data Team for 2025

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Addressing The Challenges Of Component Integration In Data Platform Architectures

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

Building A Shared Understanding Of Data Assets In A Business Through A Single Pane Of Glass With Workstream

Solving Data Discovery At Lyft

Understanding The Immune System With Data At ImmunAI

Effective Pandas Patterns For Data Engineering

Put Your Whole Data Team On The Same Page With Atlan

The Grand Vision And Present Reality of DataOps

Three Takeaways from Gartner’s 2019 Magic Quadrant for Data Management Solutions for Analytics

Data Catalog - A Broken Promise

Hadoop vs Spark: Main Big Data Tools Explained

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

Data Engineering Weekly #114

The Evolution of Table Formats

Data Engineering Weekly #105

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

DataOps Architecture: 5 Key Components and How to Get Started

Data Orchestration: Defining, Understanding, and Applying

Unified DataOps: Components, Challenges, and How to Get Started

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

The Top Data Strategy Influencers and Content Creators on LinkedIn

Azure Data Engineer (DP-203) Certification Cost in 2023

The Good and the Bad of the Elasticsearch Search and Analytics Engine

The Future of Data Engineering: DEW's 2025 Predictions

Stay Connected