Data Management and Data Storage - Data Engineering Digest

Everything a Data Scientist Should Know About Data Management

KDnuggets

OCTOBER 22, 2019

For full-stack data science mastery, you must understand data management along with all the bells and whistles of machine learning. This high-level overview is a road map for the history and current state of the expansive options for data storage and infrastructure solutions.

Data Management

Data Management Management Data Storage Machine Learning

ArangoDB: Fast, Scalable, and Multi-Model Data Storage with Jan Steeman and Jan Stücke - Episode 34

Data Engineering Podcast

JUNE 3, 2018

Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. Interview Introduction How did you get involved in the area of data management?

Data Storage

Data Storage PostgreSQL Database Architecture

From Oracle to Databases for AI: The Evolution of Data Storage

KDnuggets

FEBRUARY 15, 2022

From Oracle, to NoSQL databases, and beyond, read about data management solutions from the early days of the RBDMS to those supporting AI applications.

Data Storage

Data Storage Database NoSQL Data

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

What Comes After HDF5? Seeking a Data Storage Format for Deep Learning

KDnuggets

NOVEMBER 9, 2021

In this article we are discussing that HDF5 is one of the most popular and reliable formats for non-tabular, numerical data. This article suggests what kind of ML native data format should be to truly serve the needs of modern data scientists. But this format is not optimized for deep learning work.

Deep Learning

Deep Learning Data Storage Data Data Management

Top 10 Data Engineering Trends in 2025

Edureka

APRIL 22, 2025

It lets you describe data more complexly and make predictions. AI-powered data engineering solutions make it easier to streamline the data management process, which helps businesses find useful insights with little to no manual work. This will help make better analytics predictions and improve data management.

Data Engineering

Data Engineering Data Engineer Engineering Consulting

How to Choose the Right Data Management Solution

The Modern Data Company

MAY 10, 2023

In our previous post, The Pros and Cons of Leading Data Management and Storage Solutions , we untangled the differences among data lakes, data warehouses, data lakehouses, data hubs, and data operating systems. What factors are most important when building a data management ecosystem?

Data Management

Data Management Management Data Lake Data Warehouse

How to Choose the Right Data Management Solution

The Modern Data Company

MAY 10, 2023

In our previous post, The Pros and Cons of Leading Data Management and Storage Solutions , we untangled the differences among data lakes, data warehouses, data lakehouses, data hubs, and data operating systems. What factors are most important when building a data management ecosystem?

Data Management

Data Management Management Data Lake Data Warehouse

How to Choose the Right Data Management Solution

The Modern Data Company

MAY 10, 2023

In our previous post, The Pros and Cons of Leading Data Management and Storage Solutions , we untangled the differences among data lakes, data warehouses, data lakehouses, data hubs, and data operating systems. What factors are most important when building a data management ecosystem?

Data Management

Data Management Management Data Lake Data Warehouse

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

LinkedIn Engineering

JULY 19, 2023

We’ll also introduce OpenHouse’s control plane, specifics of the deployed system at LinkedIn including our managed Iceberg lakehouse, and the impact and roadmap for future development of OpenHouse, including a path to open source. Data services are a set of table maintenance jobs that keep the underlying storage in a healthy state.

Big Data

Big Data Data Management Management Metadata

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Though basic and easy to use, traditional table storage formats struggle to keep up. Track data files within the table along with their column statistics. Why should we use it?

Architecture

Architecture Systems Data Lake Google Cloud

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

Both traditional and AI data engineers should be fluent in SQL for managing structured data, but AI data engineers should be proficient in NoSQL databases as well for unstructured data management. Data Storage Solutions As we all know, data can be stored in a variety of ways.

Data Engineer

Data Engineer Data Engineering Engineering Unstructured Data

Telco 5G Returns Will Come from Enterprise Data Solutions

Cloudera

APRIL 22, 2022

The focus has also been hugely centred on compute rather than data storage and analysis. In reality, enterprises need their data and compute to occur in multiple locations, and to be used across multiple time frames — from real time closed-loop actions, to analysis of long-term archived data. Location-specific data.

Data Solutions

Data Solutions Amazon Web Services Data Storage Cloud

The Pros and Cons of Leading Data Management and Storage Solutions

The Modern Data Company

MAY 8, 2023

Data lakes, data warehouses, data hubs, data lakehouses, and data operating systems are data management and storage solutions designed to meet different needs in data analytics, integration, and processing. See it in action and schedule a demo with one of our data experts today.

Data Management

Data Management Management Data Lake Data Governance

The Pros and Cons of Leading Data Management and Storage Solutions

The Modern Data Company

MAY 8, 2023

Data lakes, data warehouses, data hubs, data lakehouses, and data operating systems are data management and storage solutions designed to meet different needs in data analytics, integration, and processing. See it in action and schedule a demo with one of our data experts today.

Data Management

Data Management Management Data Lake Data Governance

The Pros and Cons of Leading Data Management and Storage Solutions

The Modern Data Company

MAY 8, 2023

Data lakes, data warehouses, data hubs, data lakehouses, and data operating systems are data management and storage solutions designed to meet different needs in data analytics, integration, and processing.

Data Management

Data Management Management Data Lake Data Governance

On-Prem vs. The Cloud: Key Considerations

phData: Data Engineering

FEBRUARY 21, 2025

A data warehouse acts as a single source of truth for an organization’s data, providing a unified view of its operations and enabling data-driven decision-making. A data warehouse enables advanced analytics, reporting, and business intelligence. On the other hand, cloud data warehouses can scale seamlessly.

Cloud

Cloud Data Warehouse Amazon Web Services Data Ingestion

Snowflake and the Pursuit Of Precision Medicine

Snowflake

NOVEMBER 29, 2023

For example, the data storage systems and processing pipelines that capture information from genomic sequencing instruments are very different from those that capture the clinical characteristics of a patient from a site. The principles emphasize machine-actionability (i.e.,

Metadata

Metadata Healthcare Medical Data Storage

CockroachDB In Depth with Peter Mattis - Episode 35

Data Engineering Podcast

JUNE 10, 2018

Summary With the increased ease of gaining access to servers in data centers across the world has come the need for supporting globally distributed data storage. With the first wave of cloud era databases the ability to replicate information geographically came at the expense of transactions and familiar query languages.

PostgreSQL

PostgreSQL NoSQL Relational Database SQL

Managing Database Access Control For Teams With strongDM

Data Engineering Podcast

JANUARY 28, 2019

In this episode he explains how the strongDM proxy works to grant and audit access to storage systems and the benefits that it provides to engineers and team leads. What are some of the most common challenges around managing access and authentication for data storage systems?

Accessible

Accessible Accessibility Database Management

Simplifying Continuous Data Processing Using Stream Native Storage In Pravega with Tom Kaitchuck - Episode 63

Data Engineering Podcast

DECEMBER 31, 2018

Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. What is in store for the future of Pravega?

Lambda Architecture

Lambda Architecture Process Data Process Kafka

Reflections On Designing A Data Platform From Scratch

Data Engineering Podcast

FEBRUARY 27, 2022

In this episode Tobias Macey, the host of the show, reflects on his plans for building a data platform and what he has learned from running the podcast that is influencing his choices. Data integration (extract and load) What are your data sources? Data integration (extract and load) What are your data sources?

Designing

Designing Metadata Data Lake Relational Database

Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8

Data Engineering Podcast

NOVEMBER 22, 2017

To help other people find the show you can leave a review on iTunes , or Google Play Music , and tell your friends and co-workers This is your host Tobias Macey and today I’m interviewing Julien Le Dem and Doug Cutting about data serialization formats and how to pick the right one for your systems.

Hadoop

Hadoop Data Storage Data Pipeline Data Engineering

Setting The Stage For The Next Chapter Of The Cassandra Database

Data Engineering Podcast

SEPTEMBER 12, 2021

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Database

Database Kafka Metadata Data Storage

Harness the Power of Pinecone with Cloudera’s New Applied Machine Learning Prototype

Cloudera

NOVEMBER 1, 2023

The use of Pinecone’s technology with Cloudera creates an ecosystem that facilitates the creation and deployment of robust, scalable, real-time AI applications fueled by an organization’s unique high-value data.

Machine Learning

Machine Learning Data Ingestion Database Architecture

Data – the Octane Accelerating Intelligent Connected Vehicles

Cloudera

FEBRUARY 8, 2021

Adding more wires and throwing more compute hardware to the problem is simply not viable considering the cost and complexities of today’s connected cars or the additional demands designed into electric cars (like battery management systems and eco-trip planning).

Manufacturing

Manufacturing Machine Learning Data Ingestion Electronics

Top Data Science Jobs for Freshers You Should Know

Knowledge Hut

JANUARY 18, 2024

Data Warehousing Professionals Within the framework of a project, data warehousing specialists are responsible for developing data management processes across a company. Furthermore, they construct software applications and computer programs for accomplishing data storage and management.

Data Science

Data Science Business Analyst Data Architect ETL Method

Exploring The TileDB Universal Data Engine

Data Engineering Podcast

AUGUST 17, 2020

This was a great conversation about a different approach to database architecture and how that enables a more flexible way to store and interact with data to power better data sharing and new opportunities for blending specialized domains. If you hand a book to a new data engineer, what wisdom would you add to it?

Data Engineering

Data Engineering Data Engineer Engineering Database Design

Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop

Data Engineering Podcast

AUGUST 14, 2021

He discusses the inefficiencies that teams run into from having to reprocess data multiple times, his work on the open source Hub library to solve this problem for everyone, and his thoughts on the vast potential that exists for using computer vision to solve hard and meaningful problems. What do you have planned for the future of Activeloop?

Unstructured Data

Unstructured Data Machine Learning Data Lake SQL

Unpacking Fauna: A Global Scale Cloud Native Database

Data Engineering Podcast

APRIL 22, 2019

Summary One of the biggest challenges for any business trying to grow and reach customers globally is how to scale their data storage. It is definitely worth a good look for anyone building a platform that needs a simple to manage data layer that will scale with your business.

Database

Database Cloud NoSQL Scala

Planet Scale SQL For The New Generation Of Applications With YugabyteDB

Data Engineering Podcast

JANUARY 13, 2020

This requires a new class of data storage which can accomodate that demand without having to rearchitect your system at each level of growth. YugabyteDB is an open source database designed to support planet scale workloads with high data density and full ACID compliance. A growing trend in database engines (e.g.

SQL

SQL MongoDB PostgreSQL Database

The Dawn of the AI-Native Data Stack - Part 1

Data Engineering Weekly

OCTOBER 11, 2024

The data world is abuzz with speculation about the future of data engineering and the successor to the celebrated modern data stack. While the modern data stack has undeniably revolutionized data management with its cloud-native approach, its complexities and limitations are becoming increasingly apparent.

Manufacturing

Manufacturing Transportation Data Warehouse Unstructured Data

Top 10 Cloud Computing Companies of 2024

Knowledge Hut

MARCH 7, 2024

Cloud providers can offer you access to the infrastructures such as database services, servers, networks, data management , and data storage. It includes resources such as software, servers, databases, data storage, and networking.

Cloud Computing

Cloud Computing Amazon Web Services Cloud Google Cloud

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44

Data Engineering Podcast

AUGUST 19, 2018

Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. Interview Introduction How did you get involved in the area of data management?

Database

Database PostgreSQL NoSQL Transportation

Big Data Technologies that Everyone Should Know in 2024

Knowledge Hut

APRIL 25, 2024

Each of these technologies has its own strengths and weaknesses, but all of them can be used to gain insights from large data sets. As organizations continue to generate more and more data, big data technologies will become increasingly essential. Let's explore the technologies available for big data.

Big Data

Big Data Technology Hadoop NoSQL

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

DECEMBER 16, 2019

Even if you already have a metadata repository this is worth a listen to learn more about the value that visibility of your data can bring to your organization. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Can you start by describing what Marquez is?

Metadata

Metadata PostgreSQL Datasets Data Warehouse

Data News — Week 23.24

Christophe Blefari

JUNE 16, 2023

The power of pre-commit and SQLFluff —SQL is a query programming language used to retrieve information from data storages, and like any other programming language, you need to enforce checks at all times. This is where you should use pre-commit and SQLFluff.

Programming Language

Programming Language SQL PostgreSQL Data

Data News — Week 22.45

Christophe Blefari

NOVEMBER 11, 2022

Kovid wrote an article that tries to explain what are the ingredients of a data warehouse. A data warehouse is a piece of technology that acts on 3 ideas: the data modeling, the data storage and processing engine. And he does it well. In the post Kovid details every idea.

BI

BI Data Warehouse Data Database

How to Navigate the Costs of Legacy SIEMS with Snowflake

Snowflake

APRIL 18, 2024

Legacy SIEM cost factors to keep in mind Data ingestion: Traditional SIEMs often impose limits to data ingestion and data retention. Snowflake allows security teams to store all their data in a single platform and maintain it all in a readily accessible state, with virtually unlimited cloud data storage capacity.

Data Lake

Data Lake Data Ingestion Bytes Cloud Computing

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

Towards Data Science

DECEMBER 15, 2023

Institutional Considerations While I am on this topic of data management, I should mention—I recently started a new role! I am the first senior machine learning engineer at DataGrail, a company that provides a suite of B2B services helping companies secure and manage their customer data. How Much Data Do We Need?

Machine Learning

Machine Learning Data Science Data Security Data Storage

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Striim, for instance, facilitates the seamless integration of real-time streaming data from various sources, ensuring that it is continuously captured and delivered to big data storage targets. Data storage Data storage follows.

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

Hybrid Data Cloud Success for State and Local Governments

Cloudera

MARCH 29, 2022

This is especially crucial to state and local government IT teams, who must balance their vital missions against resource constraints, compliance requirements, cybersecurity risks, and ever-increasing volumes of data. How does hybrid cloud help turn data into a strategic asset?

Government

Government Cloud Cloud Computing Data Science

Fraud Prevention – 3 Data Strategies for Financial Services

Cloudera

NOVEMBER 18, 2020

Given the increase of financial fraud this year and the upcoming holiday shopping season, which historically also leads to an increase, I am taking this opportunity to highlight 3 specific data and analytics strategies that can help in the fight against fraud across the Financial Services industry. . 1- Break down the Silos.

Banking

Banking Machine Learning Electronics Data

Unify your data: AI and Analytics in an Open Lakehouse

Cloudera

MAY 30, 2024

This scalability ensures the data lakehouse remains responsive and performant, even as data complexity and usage patterns change over time. machine learning, graph processing), an open data lakehouse caters to a wide range of analytics workloads, from ad-hoc querying to complex data processing and predictive modeling.

Data Lake

Data Lake Data Warehouse Programming Language Data Ingestion

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

Both companies have added Data and AI to their slogan, Snowflake used to be The Data Cloud and now they're The AI Data Cloud. A table format creates an abstraction layer between you and the storage format, allowing you to interact with files in storage as if they were tables.

Metadata

Metadata Data Warehouse BI MySQL

Everything a Data Scientist Should Know About Data Management

ArangoDB: Fast, Scalable, and Multi-Model Data Storage with Jan Steeman and Jan Stücke - Episode 34

Webinars

Trending Sources

From Oracle to Databases for AI: The Evolution of Data Storage

Webinars

What Comes After HDF5? Seeking a Data Storage Format for Deep Learning

Top 10 Data Engineering Trends in 2025

How to Choose the Right Data Management Solution

How to Choose the Right Data Management Solution

How to Choose the Right Data Management Solution

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

Why Open Table Format Architecture is Essential for Modern Data Systems

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Telco 5G Returns Will Come from Enterprise Data Solutions

The Pros and Cons of Leading Data Management and Storage Solutions

The Pros and Cons of Leading Data Management and Storage Solutions

The Pros and Cons of Leading Data Management and Storage Solutions

On-Prem vs. The Cloud: Key Considerations

Snowflake and the Pursuit Of Precision Medicine

CockroachDB In Depth with Peter Mattis - Episode 35

Managing Database Access Control For Teams With strongDM

Simplifying Continuous Data Processing Using Stream Native Storage In Pravega with Tom Kaitchuck - Episode 63

Reflections On Designing A Data Platform From Scratch

Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8

Setting The Stage For The Next Chapter Of The Cassandra Database

Harness the Power of Pinecone with Cloudera’s New Applied Machine Learning Prototype

Data – the Octane Accelerating Intelligent Connected Vehicles

Top Data Science Jobs for Freshers You Should Know

Exploring The TileDB Universal Data Engine

Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop

Unpacking Fauna: A Global Scale Cloud Native Database

Planet Scale SQL For The New Generation Of Applications With YugabyteDB

The Dawn of the AI-Native Data Stack - Part 1

Top 10 Cloud Computing Companies of 2024

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44

Big Data Technologies that Everyone Should Know in 2024

Solving Data Lineage Tracking And Data Discovery At WeWork

Data News — Week 23.24

Data News — Week 22.45

How to Navigate the Costs of Legacy SIEMS with Snowflake

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

A Guide to Data Pipelines (And How to Design One From Scratch)

Hybrid Data Cloud Success for State and Local Governments

Fraud Prevention – 3 Data Strategies for Financial Services

Unify your data: AI and Analytics in an Open Lakehouse

Databricks, Snowflake and the future

Stay Connected