Data and NoSQL - Data Engineering Digest

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Introduction to Databases in Data Science

KDnuggets

SEPTEMBER 8, 2023

Understand the relevance of databases in data science. Also learn the fundamentals of relational databases, NoSQL database categories, and more.

Database

Database Data Science NoSQL Relational Database

Understanding NoSQL Data Replication: A Comprehensive Guide

Hevo

MAY 24, 2023

Data drives the business world, and a significant amount of that data is unstructured. This implies that traditional relational databases can not cater to the needs of organizations seeking to store and manipulate this unstructured data. NoSQL Databases […]

NoSQL

NoSQL Unstructured Data Relational Database Database

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data

Analytics Vidhya

FEBRUARY 22, 2023

Introduction Data replication is also known as database replication, which is copying data to ensure that all information remains consistent across all data resources in real-time. data replication is like a safety net that keeps your information safe from disappearing or falling through the cracks.

Database

Database Data NoSQL Datasets

RDBMS vs NoSQL: Key Differences and Similarities

Knowledge Hut

MARCH 15, 2024

Making decisions in the database space requires deciding between RDBMS (Relational Database Management System) and NoSQL, each of which has unique features. RDBMS uses SQL to organize data into structured tables, whereas NoSQL is more flexible and can handle a wider range of data types because of its dynamic schemas.

NoSQL

NoSQL Database-centric Relational Database MongoDB

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data projects are notoriously complex.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Expert Talk TLDR: SQL vs NoSQL Databases in the Modern Data Stack

Rockset

JULY 22, 2022

Last week, Rockset hosted a conversation with a few seasoned data architects and data practitioners steeped in NoSQL databases to talk about the current state of NoSQL in 2022 and how data teams should think about it. NoSQL is great for well understood access patterns. Much was discussed.

NoSQL

NoSQL SQL Database AWS

From Oracle to Databases for AI: The Evolution of Data Storage

KDnuggets

FEBRUARY 15, 2022

From Oracle, to NoSQL databases, and beyond, read about data management solutions from the early days of the RBDMS to those supporting AI applications.

Data Storage

Data Storage Database NoSQL Data

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

The rise of AI and GenAI has brought about the rise of new questions in the data ecosystem – and new roles. One job that has become increasingly popular across enterprise data teams is the role of the AI data engineer. Demand for AI data engineers has grown rapidly in data-driven organizations.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Data News — Week 23.42

Christophe Blefari

OCTOBER 20, 2023

Read my dbt multi-project guide 📺 On the content side I'll also present next week the Fancy Data Stack project at the Data Engineering And Machine Learning Summit 2023 organised by Seattle Data Guy. This post gives great insights about the impact on the data platform team. What are the main differences?

Generalist

Generalist Entertainment NoSQL Datasets

Big Data Technologies that Everyone Should Know in 2024

Knowledge Hut

APRIL 25, 2024

Big data in information technology is used to improve operations, provide better customer service, develop customized marketing campaigns, and take other actions to increase revenue and profits. It is especially true in the world of big data. It is especially true in the world of big data. What Are Big Data T echnologies?

Big Data

Big Data Technology Hadoop NoSQL

Why Real-Time Analytics Requires Both the Flexibility of NoSQL and Strict Schemas of SQL Systems

Rockset

JULY 6, 2022

This is the fifth post in a series by Rockset's CTO and Co-founder Dhruba Borthakur on Designing the Next Generation of Data Systems for Real-Time Analytics. So are schemaless NoSQL databases, which capably ingest firehoses of data but are poor at extracting complex insights from that data.

NoSQL

NoSQL SQL Systems PostgreSQL

Citus Data: Distributed PostGreSQL for Big Data with Ozgun Erdogan and Craig Kerstiens - Episode 13

Data Engineering Podcast

JANUARY 7, 2018

At Citus Data they have built an extension to support running it in a distributed fashion across large volumes of data with parallelized queries for improved performance. For someone who is interested in migrating to Citus, what is involved in getting it deployed and moving the data out of an existing system?

PostgreSQL

PostgreSQL Big Data NoSQL Data

Case Study: Is Your NoSQL Data Hindering Real-Time Analytics? Savvy Solved It with Rockset.

Rockset

JULY 21, 2022

Jeremy Evans, Co-founder and CTO, Savvy At Savvy , we have a lot of responsibility when it comes to data. However, delivering rich and timely insights was a challenge for us from the start, as our original platform was great at ingesting data, but not so great at analyzing and reporting. Rockset was incredibly easy to get started.

NoSQL

NoSQL IT MongoDB SQL

A Prequel to Data Mesh

Towards Data Science

JANUARY 16, 2024

My personal take on justifying the existence of Data Mesh A senior stakeholder at one my projects mentioned that they wanted to decentralise their data platform architecture and democratise data across the organisation. When I heard the words ‘decentralised data architecture’, I was left utterly confused at first!

Data Warehouse

Data Warehouse Data Architecture Relational Database NoSQL

Data Modeling That Evolves With Your Business Using Data Vault

Data Engineering Podcast

FEBRUARY 9, 2020

Summary Designing the structure for your data warehouse is a complex and challenging process. As businesses deal with a growing number of sources and types of information that they need to integrate, they need a data modeling strategy that provides them with flexibility and speed.

Data Lake

Data Lake Data Warehouse Hadoop NoSQL

HBase Deprecation at Pinterest

Pinterest Engineering

MAY 13, 2024

Overview of HBase at Pinterest Introduced in 2013, HBase was Pinterest’s first NoSQL datastore. Along with the rising popularity of NoSQL, HBase quickly became one of the most widely used storage backends at Pinterest. At its peak usage, we had around 50 clusters, 9000 AWS EC2 instances, and over 6 PBs of data.

NoSQL

NoSQL MySQL Database Systems

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Data Engineering Podcast

FEBRUARY 11, 2018

Summary As communications between machines become more commonplace the need to store the generated data in a time-oriented manner increases. The market for timeseries data stores has many contenders, but they are not all built to solve the same problems or to scale in the same manner.

PostgreSQL

PostgreSQL NoSQL Google Cloud MongoDB

A Practical Introduction To Graph Data Applications

Data Engineering Podcast

AUGUST 3, 2020

Summary Finding connections between data and the entities that they represent is a complex problem. Graph data models and the applications built on top of them are perfect for representing relationships and finding emergent structures in your information. If you hand a book to a new data engineer, what wisdom would you add to it?

NoSQL

NoSQL Database Algorithm Relational Database

Rebuilding a Cassandra cluster using Yelp’s Data Pipeline

Yelp Engineering

JANUARY 29, 2023

The same principles of these systems can be adopted to filter out malformed data from datastores. This blog post deep dives into how we rebuilt one of our Cassandra(C*) clusters by removing malformed data using Yelp’s Data Pipeline. Many different features on Yelp are powered by Cassandra.

Data Pipeline

Data Pipeline NoSQL Manufacturing Data

Data ingestion pipeline with Operation Management

Netflix Tech

MARCH 7, 2023

These media focused machine learning algorithms as well as other teams generate a lot of data from the media files, which we described in our previous blog , are stored as annotations in Marken. Similarly, client teams don’t have to worry about when or how the data is written. in a video file.

Data Ingestion

Data Ingestion Management Algorithm Media

Big Data Analytics: How It Works, Tools, and Real-Life Applications

AltexSoft

MAY 14, 2021

Big Data enjoys the hype around it and for a reason. But the understanding of the essence of Big Data and ways to analyze it is still blurred. And that’s the most important thing: Big Data analytics helps companies deal with business problems that couldn’t be solved with the help of traditional approaches and tools.

Big Data

Big Data Data Analytics IT NoSQL

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

Data Engineering Podcast

JULY 16, 2021

Summary There is a wealth of tools and systems available for processing data, but the user experience of integrating them and building workflows is still lacking. Raj Bains founded Prophecy to address this need by creating a UI first platform for building and executing data engineering workflows that orchestrates Airflow and Spark.

High Quality Data

High Quality Data Data Engineering Data Engineer Coding

Stretching The Elastic Stack with Philipp Krenn - Episode 23

Data Engineering Podcast

MARCH 18, 2018

In this episode Philipp Krenn describes the various pieces of the stack, how they fit together, and how you can use them in your infrastructure to store, search, and analyze your data. What are the common scaling bottlenecks that users should be aware of when they are dealing with large volumes of data?

MongoDB

MongoDB NoSQL Machine Learning Data Engineering

Streaming Data Pipelines: What Are They and How to Build One

Precisely

DECEMBER 28, 2023

The concept of streaming data was born of necessity. But insights derived from day-old data don’t cut it. Business success is based on how we use continuously changing data. That’s where streaming data pipelines come into play. What is a streaming data pipeline? How do streaming data pipelines work?

Data Pipeline

Data Pipeline Building Kafka Big Data

Database Refactoring Patterns with Pramod Sadalage - Episode 22

Data Engineering Podcast

MARCH 11, 2018

Practices such as version controlled migration scripts and iterative schema evolution provide the necessary mechanisms to ensure that your data layer is as agile as your application. How has the prevalence of data abstractions such as ORMs or ODMs impacted the practice of schema design and evolution?

Database

Database MongoDB NoSQL Database Design

CockroachDB In Depth with Peter Mattis - Episode 35

Data Engineering Podcast

JUNE 10, 2018

Summary With the increased ease of gaining access to servers in data centers across the world has come the need for supporting globally distributed data storage. What are some of the tradeoffs that are necessary to allow for georeplicated data with distributed transactions?

PostgreSQL

PostgreSQL NoSQL Relational Database SQL

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Data is the fuel that drives government, enables transparency, and powers citizen services. That should be easy, but when agencies don’t share data or applications, they don’t have a unified view of people. Legacy data sharing involves proliferating copies of data, creating data management, and security challenges.

Data Architecture

Data Architecture Architecture Data Lake NoSQL

Getting Started with Cloudera Data Platform Operational Database (COD)

Cloudera

NOVEMBER 23, 2021

Operational Database is a relational and non-relational database built on Apache HBase and is designed to support OLTP applications, which use big data. The operational database in Cloudera Data Platform has the following components: . Shared Data Experience (SDX) is used for security and governance capabilities.

Database

Database Non-relational Database NoSQL Government

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

Hadoop and Spark are the two most popular platforms for Big Data processing. They both enable you to deal with huge collections of data no matter its format — from Excel tables to user feedback on websites to images and video files. Which Big Data tasks does Spark solve most effectively? How does it work? cost-effectiveness.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

Beyond Legacy Detection: How AI-Driven Data Governance Surpasses Traditional Methods

Striim

MARCH 4, 2025

Have you ever wondered how the biggest brands in the world falter when it comes to data security? Their breach transformed personal customer data into a commodity traded on dark web forums. They react too slowly, too rigidly, and cant keep pace with the dynamic, sophisticated attacks occurring today, leaving hackable data exposed.

Data Governance

Data Governance Government Healthcare NoSQL

Top 16 Data Science Job Roles To Pursue in 2024

Knowledge Hut

DECEMBER 26, 2023

According to the World Economic Forum, the amount of data generated per day will reach 463 exabytes (1 exabyte = 10 9 gigabytes) globally by the year 2025. Thus, almost every organization has access to large volumes of rich data and needs “experts” who can generate insights from this rich data.

Data Science

Data Science BI Machine Learning Business Intelligence

Use SurrealDB to Persist Data with Rocket REST API

Workfall

MARCH 21, 2023

Reading Time: 8 minutes Databases are essential in web development for organizing data in various forms and shapes (both structured and unstructured). With these GUIs, we can get a bird’s-eye view of all the data in our database for easy analysis of the schema or data types, as well as general ease of administration.

PostgreSQL

PostgreSQL NoSQL Database Unstructured Data

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

OCTOBER 30, 2021

Explaining the difference, especially when they both work with something intangible such as data , is difficult. If you’re an executive who has a hard time understanding the underlying processes of data science and get confused with terminology, keep reading. Data science vs data engineering.

Data Engineering

Data Engineering Data Engineer Engineering Machine Learning

Introducing Netflix’s Key-Value Data Abstraction Layer

Netflix Tech

SEPTEMBER 18, 2024

Central to this infrastructure is our use of multiple online distributed databases such as Apache Cassandra , a NoSQL database known for its high availability and scalability. Second, developers had to constantly re-learn new data modeling practices and common yet critical data access patterns.

Bytes

Bytes Metadata Database Data

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is Data Science? What are the roles and responsibilities of a Data Engineer? What is the need for Data Science?

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Real-Time Data Streaming: MongoDB Change Stream Kafka

Hevo

AUGUST 27, 2024

With the rise of modern data tools, real-time data processing is no longer a dream. The ability to react and process data has become critical for many systems. Over the past few years, MongoDB has become a popular choice for NoSQL Databases.

MongoDB

MongoDB NoSQL Kafka Data

Most important Data Engineering Concepts and Tools for Data Scientists

DareData

JANUARY 30, 2023

Learn the most important data engineering concepts that data scientists should be aware of. As the field of data science and machine learning continues to evolve, it is increasingly evident that data engineering cannot be separated from it.

Data Engineering

Data Engineering Data Engineer NoSQL Engineering

Building A New Foundation For CouchDB

Data Engineering Podcast

MARCH 16, 2020

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Building

Building Data Warehouse NoSQL Data Lake

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44

Data Engineering Podcast

AUGUST 19, 2018

Summary The way that you store your data can have a huge impact on the ways that it can be practically used. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode.

Database

Database PostgreSQL NoSQL Transportation

Best Morgan Stanley Data Engineer Interview Questions

U-Next

MARCH 1, 2023

Introduction Data Engineer is responsible for managing the flow of data to be used to make better business decisions. A solid understanding of relational databases and SQL language is a must-have skill, as an ability to manipulate large amounts of data effectively. In 2022, data engineering will hold a share of 29.8%

Data Engineering

Data Engineering Data Engineer Non-relational Database Engineering

From Big Data to Better Data: Ensuring Data Quality with Verity

Lyft Engineering

OCTOBER 3, 2023

High-quality data is necessary for the success of every data-driven company. It is now the norm for tech companies to have a well-developed data platform. This makes it easy for engineers to generate, transform, store, and analyze data at the petabyte scale. What and Where is Data Quality?

Big Data

Big Data Metadata Data Warehouse Data

Top 6 Cassandra Interview Questions

Top 5 Interview Questions on Cassandra

Webinars

Trending Sources

Designing A Non-Relational Database Engine

Webinars

Introduction to Databases in Data Science

Understanding NoSQL Data Replication: A Comprehensive Guide

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data

RDBMS vs NoSQL: Key Differences and Similarities

Modern Customer Data Platform Principles

Expert Talk TLDR: SQL vs NoSQL Databases in the Modern Data Stack

From Oracle to Databases for AI: The Evolution of Data Storage

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Data News — Week 23.42

Big Data Technologies that Everyone Should Know in 2024

Why Real-Time Analytics Requires Both the Flexibility of NoSQL and Strict Schemas of SQL Systems

Citus Data: Distributed PostGreSQL for Big Data with Ozgun Erdogan and Craig Kerstiens - Episode 13

Case Study: Is Your NoSQL Data Hindering Real-Time Analytics? Savvy Solved It with Rockset.

A Prequel to Data Mesh

Data Modeling That Evolves With Your Business Using Data Vault

HBase Deprecation at Pinterest

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

A Practical Introduction To Graph Data Applications

Rebuilding a Cassandra cluster using Yelp’s Data Pipeline

Data ingestion pipeline with Operation Management

Big Data Analytics: How It Works, Tools, and Real-Life Applications

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

Stretching The Elastic Stack with Philipp Krenn - Episode 23

Streaming Data Pipelines: What Are They and How to Build One

Database Refactoring Patterns with Pramod Sadalage - Episode 22

CockroachDB In Depth with Peter Mattis - Episode 35

Breaking State and Local Data Silos with Modern Data Architectures

Getting Started with Cloudera Data Platform Operational Database (COD)

Hadoop vs Spark: Main Big Data Tools Explained

Beyond Legacy Detection: How AI-Driven Data Governance Surpasses Traditional Methods

Top 16 Data Science Job Roles To Pursue in 2024

Use SurrealDB to Persist Data with Rocket REST API

Data Scientist vs Data Engineer: Differences and Why You Need Both

Introducing Netflix’s Key-Value Data Abstraction Layer

How to Become a Data Engineer in 2024?

Real-Time Data Streaming: MongoDB Change Stream Kafka

Most important Data Engineering Concepts and Tools for Data Scientists

Building A New Foundation For CouchDB

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44

Best Morgan Stanley Data Engineer Interview Questions

From Big Data to Better Data: Ensuring Data Quality with Verity

Stay Connected