Data, NoSQL and SQL - Data Engineering Digest

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data projects are notoriously complex.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Expert Talk TLDR: SQL vs NoSQL Databases in the Modern Data Stack

Rockset

JULY 22, 2022

Last week, Rockset hosted a conversation with a few seasoned data architects and data practitioners steeped in NoSQL databases to talk about the current state of NoSQL in 2022 and how data teams should think about it. NoSQL is great for well understood access patterns. Much was discussed.

NoSQL

NoSQL SQL Database AWS

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data

Analytics Vidhya

FEBRUARY 22, 2023

Introduction Data replication is also known as database replication, which is copying data to ensure that all information remains consistent across all data resources in real-time. data replication is like a safety net that keeps your information safe from disappearing or falling through the cracks.

Database

Database Data NoSQL Datasets

Introduction to Databases in Data Science

KDnuggets

SEPTEMBER 8, 2023

Understand the relevance of databases in data science. Also learn the fundamentals of relational databases, NoSQL database categories, and more.

Database

Database Data Science NoSQL Relational Database

RDBMS vs NoSQL: Key Differences and Similarities

Knowledge Hut

MARCH 15, 2024

Making decisions in the database space requires deciding between RDBMS (Relational Database Management System) and NoSQL, each of which has unique features. RDBMS uses SQL to organize data into structured tables, whereas NoSQL is more flexible and can handle a wider range of data types because of its dynamic schemas.

NoSQL

NoSQL Database-centric Relational Database PostgreSQL

Why Real-Time Analytics Requires Both the Flexibility of NoSQL and Strict Schemas of SQL Systems

Rockset

JULY 6, 2022

This is the fifth post in a series by Rockset's CTO and Co-founder Dhruba Borthakur on Designing the Next Generation of Data Systems for Real-Time Analytics. So are schemaless NoSQL databases, which capably ingest firehoses of data but are poor at extracting complex insights from that data. They also ran a lot faster.

NoSQL

NoSQL SQL Systems PostgreSQL

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

The rise of AI and GenAI has brought about the rise of new questions in the data ecosystem – and new roles. One job that has become increasingly popular across enterprise data teams is the role of the AI data engineer. Demand for AI data engineers has grown rapidly in data-driven organizations.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Big Data Technologies that Everyone Should Know in 2024

Knowledge Hut

APRIL 25, 2024

Big data in information technology is used to improve operations, provide better customer service, develop customized marketing campaigns, and take other actions to increase revenue and profits. It is especially true in the world of big data. It is especially true in the world of big data. What Are Big Data T echnologies?

Big Data

Big Data Technology Hadoop NoSQL

Data News — Week 23.42

Christophe Blefari

OCTOBER 20, 2023

Read my dbt multi-project guide 📺 On the content side I'll also present next week the Fancy Data Stack project at the Data Engineering And Machine Learning Summit 2023 organised by Seattle Data Guy. Tests are directly added in the SQL code at the column that is target. What are the main differences?

Generalist

Generalist Entertainment NoSQL Datasets

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Data Engineering Podcast

FEBRUARY 11, 2018

Summary As communications between machines become more commonplace the need to store the generated data in a time-oriented manner increases. The market for timeseries data stores has many contenders, but they are not all built to solve the same problems or to scale in the same manner. What impact has the 10.0

PostgreSQL

PostgreSQL NoSQL Google Cloud MongoDB

HBase vs Cassandra-The Battle of the Best NoSQL Databases

ProjectPro

SEPTEMBER 16, 2021

NoSQL databases are the new-age solutions to distributed unstructured data storage and processing. The speed, scalability, and fail-over safety offered by NoSQL databases are needed in the current times in the wake of Big Data Analytics and Data Science technologies. It was modeled after Amazon’s DynamoDb.

NoSQL

NoSQL Database Hadoop Big Data

Citus Data: Distributed PostGreSQL for Big Data with Ozgun Erdogan and Craig Kerstiens - Episode 13

Data Engineering Podcast

JANUARY 7, 2018

At Citus Data they have built an extension to support running it in a distributed fashion across large volumes of data with parallelized queries for improved performance. For someone who is interested in migrating to Citus, what is involved in getting it deployed and moving the data out of an existing system?

PostgreSQL

PostgreSQL Big Data NoSQL Data

CockroachDB In Depth with Peter Mattis - Episode 35

Data Engineering Podcast

JUNE 10, 2018

Summary With the increased ease of gaining access to servers in data centers across the world has come the need for supporting globally distributed data storage. To address these shortcomings the engineers at Cockroach Labs have built a globally distributed SQL database with full ACID semantics in Cockroach DB.

PostgreSQL

PostgreSQL NoSQL Relational Database SQL

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

Data Engineering Podcast

JULY 16, 2021

Summary There is a wealth of tools and systems available for processing data, but the user experience of integrating them and building workflows is still lacking. Raj Bains founded Prophecy to address this need by creating a UI first platform for building and executing data engineering workflows that orchestrates Airflow and Spark.

High Quality Data

High Quality Data Data Engineering Data Engineer Coding

SQL and Complex Queries Are Needed for Real-Time Analytics

Rockset

MAY 17, 2022

This is the fourth post in a series by Rockset's CTO Dhruba Borthakur on Designing the Next Generation of Data Systems for Real-Time Analytics. For instance, customer personalization systems need to combine historic data sets with real-time data streams to instantly provide the most relevant product recommendations to customers.

SQL

SQL NoSQL Hadoop MongoDB

Case Study: Is Your NoSQL Data Hindering Real-Time Analytics? Savvy Solved It with Rockset.

Rockset

JULY 21, 2022

Jeremy Evans, Co-founder and CTO, Savvy At Savvy , we have a lot of responsibility when it comes to data. However, delivering rich and timely insights was a challenge for us from the start, as our original platform was great at ingesting data, but not so great at analyzing and reporting. Rockset was incredibly easy to get started.

NoSQL

NoSQL IT MongoDB SQL

How to Learn SQL Basics for Data Science in 2023?

ProjectPro

DECEMBER 17, 2021

Data science and artificial intelligence might be the buzzwords of recent times, but they are of no value without the right data backing them. The process of data collection has increased exponentially over the last few years. Table of Contents Why SQL for Data Science? What is SQL? Why SQL for Data Science?

Data Science

Data Science SQL NoSQL Programming Language

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

Hadoop and Spark are the two most popular platforms for Big Data processing. They both enable you to deal with huge collections of data no matter its format — from Excel tables to user feedback on websites to images and video files. Which Big Data tasks does Spark solve most effectively? How does it work? cost-effectiveness.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

Top 16 Data Science Job Roles To Pursue in 2024

Knowledge Hut

DECEMBER 26, 2023

According to the World Economic Forum, the amount of data generated per day will reach 463 exabytes (1 exabyte = 10 9 gigabytes) globally by the year 2025. Thus, almost every organization has access to large volumes of rich data and needs “experts” who can generate insights from this rich data.

Data Science

Data Science BI Machine Learning Business Intelligence

Mythbusting: The Venerable SQL Database and Today’s Real-Time Analytics

Rockset

JANUARY 5, 2022

Rockset is the real-time analytics database in the cloud for modern data teams. Get faster analytics on fresher data, at lower costs, by exploiting indexing over brute-force scanning. In many tech circles, SQL databases remain synonymous with old-school on-premises databases like Oracle or DB2. This may come as a surprise.

Database

Database SQL NoSQL Raw Data

SQL Alchemy Tutorial

Edureka

MAY 23, 2023

SQL Alchemy is a powerful and popular Python library that provides an Object-Relational Mapping (ORM) tool for working with relational databases. For comparable searches of SQL Alchemy you can chain Python objects or write your query as a string. Collections are cached inside a session.

SQL

SQL Relational Database Python NoSQL

Getting Started with Cloudera Data Platform Operational Database (COD)

Cloudera

NOVEMBER 23, 2021

Operational Database is a relational and non-relational database built on Apache HBase and is designed to support OLTP applications, which use big data. The operational database in Cloudera Data Platform has the following components: . Shared Data Experience (SDX) is used for security and governance capabilities.

Database

Database Non-relational Database NoSQL Government

Big Data Analytics: How It Works, Tools, and Real-Life Applications

AltexSoft

MAY 14, 2021

Big Data enjoys the hype around it and for a reason. But the understanding of the essence of Big Data and ways to analyze it is still blurred. And that’s the most important thing: Big Data analytics helps companies deal with business problems that couldn’t be solved with the help of traditional approaches and tools.

Big Data

Big Data Data Analytics IT NoSQL

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is Data Science? What are the roles and responsibilities of a Data Engineer? What is the need for Data Science?

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Powering SQL Draw with Rockset, Retool and dbt

Rockset

DECEMBER 17, 2021

If you were one of the 15,000 people who attended Coalesce 2021 , you will likely remember SQL Draw, the Slack-based game combining SQL with cartesian geometry, art, creativity and teamwork. If you missed it, you can read more about SQL Draw on the Omnata website. Query Lambdas make it easy to create data APIs.

SQL

SQL NoSQL Database Design Metadata

Serverless Data Management: A SQL Search and Analytics Engine

Rockset

MARCH 21, 2019

When we started Rockset, we envisioned building a powerful cloud data management system that was really easy to use. Making the data stack simpler is fundamental to making data usable by developers and data scientists. No scaling limits – Users shouldn't have to worry about hitting a wall with their data footprint growth.

SQL

SQL Data Management Management Engineering

Top 11 Programming Languages for Data Science

Knowledge Hut

JANUARY 18, 2024

Data science is a multidisciplinary field that requires a broad set of skills from mathematics and statistics to programming, machine learning, and data visualization. The world has been swept by the rise of data science and machine learning. Data scientists are in high demand, and the demand will only continue to rise.

Programming Language

Programming Language Data Science Programming Java

The Future of SQL: Databases Meet Stream Processing

Knowledge Hut

JULY 24, 2023

The future of SQL (Structured Query Language) is a scalding subject among professionals in the data-driven world. As data generation continues to skyrocket, the demand for real-time decision-making, data processing, and analysis increases. How is SQL Being Utilized? billion in 2022 to $154.6

Database

Database SQL Process NoSQL

SQL for Data Engineering: Success Blueprint for Data Engineers

ProjectPro

FEBRUARY 16, 2023

The demand for skilled data engineers who can build, maintain, and optimize large data infrastructures does not seem to slow down any sooner. At the heart of these data engineering skills lies SQL that helps data engineers manage and manipulate large amounts of data. of data engineer job postings on Indeed?

Data Engineering

Data Engineering Data Engineer SQL Engineering

Using Tableau with DynamoDB: How to Build a Real-Time SQL Dashboard on NoSQL Data

Rockset

AUGUST 29, 2019

In this blog, we examine DynamoDB reporting and analytics, which can be challenging given the lack of SQL and the difficulty running analytical queries in DynamoDB. We will demonstrate how you can build an interactive dashboard with Tableau, using SQL on data from DynamoDB, in a series of easy steps, with no ETL involved.

NoSQL

NoSQL SQL Building Unstructured Data

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

OCTOBER 30, 2021

Explaining the difference, especially when they both work with something intangible such as data , is difficult. If you’re an executive who has a hard time understanding the underlying processes of data science and get confused with terminology, keep reading. Data science vs data engineering.

Data Engineering

Data Engineering Data Engineer Engineering Machine Learning

From Big Data to Better Data: Ensuring Data Quality with Verity

Lyft Engineering

OCTOBER 3, 2023

High-quality data is necessary for the success of every data-driven company. It is now the norm for tech companies to have a well-developed data platform. This makes it easy for engineers to generate, transform, store, and analyze data at the petabyte scale. What and Where is Data Quality?

Big Data

Big Data Metadata Data Warehouse Data

Why SQL on Raw Data?

Rockset

NOVEMBER 1, 2018

Over a decade after the inception of the Hadoop project, the amount of unstructured data available to modern applications continues to increase. Moreover, despite forecasts to the contrary, SQL remains the lingua franca of data processing; today's NoSQL and Big Data infrastructure platform usage often involves some form of SQL-based querying.

Raw Data

Raw Data SQL Unstructured Data NoSQL

Use SurrealDB to Persist Data with Rocket REST API

Workfall

MARCH 21, 2023

Reading Time: 8 minutes Databases are essential in web development for organizing data in various forms and shapes (both structured and unstructured). With these GUIs, we can get a bird’s-eye view of all the data in our database for easy analysis of the schema or data types, as well as general ease of administration.

PostgreSQL

PostgreSQL NoSQL Database Unstructured Data

Smart Schema: Enabling SQL Queries on Semi-Structured Data

Rockset

NOVEMBER 19, 2020

In this blog post, we show how Rockset’s Smart Schema feature lets developers use real-time SQL queries to extract meaningful insights from raw semi-structured data ingested without a predefined schema. This is particularly true given the nature of real-world data. In SQL-based systems, the data is strongly and statically typed.

Structured Data

Structured Data SQL NoSQL Raw Data

Using CockroachDB to Reduce Feature Store Costs by 75%

DoorDash Engineering

MARCH 21, 2023

At a high level, CockroachDB is a Postgres -compatible SQL layer that is capable of operating across multiple availability zones. Underneath the SQL layer is a strongly-consistent distributed key-value store. Like Cassandra , data is stored using an LSM.

Machine Learning

Machine Learning AWS Database Utilities

Beyond Legacy Detection: How AI-Driven Data Governance Surpasses Traditional Methods

Striim

MARCH 4, 2025

Have you ever wondered how the biggest brands in the world falter when it comes to data security? Their breach transformed personal customer data into a commodity traded on dark web forums. They react too slowly, too rigidly, and cant keep pace with the dynamic, sophisticated attacks occurring today, leaving hackable data exposed.

Data Governance

Data Governance Government Healthcare NoSQL

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44

Data Engineering Podcast

AUGUST 19, 2018

Summary The way that you store your data can have a huge impact on the ways that it can be practically used. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode.

Database

Database PostgreSQL NoSQL Transportation

From Schemaless Ingest to Smart Schema: Enabling SQL on Raw Data

Rockset

MARCH 27, 2019

You have complex, semi-structured data—nested JSON or XML, for instance, containing mixed types, sparse fields, and null values. The application you're implementing needs to analyze this data, combining it with other datasets, to return live metrics and recommended actions. Where do you begin?

Raw Data

Raw Data SQL NoSQL Datasets

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

On September 24, 2019, Cloudera launched CDP Public Cloud (CDP-PC) as the first step in delivering the industry’s first Enterprise Data Cloud. CDP Machine Learning: a kubernetes-based service that allows data scientists to deploy collaborative workspaces with secure, self-service access to enterprise data. That Was Then.

Cloud

Cloud Data Warehouse Machine Learning AWS

Most important Data Engineering Concepts and Tools for Data Scientists

DareData

JANUARY 30, 2023

Learn the most important data engineering concepts that data scientists should be aware of. As the field of data science and machine learning continues to evolve, it is increasingly evident that data engineering cannot be separated from it.

Data Engineering

Data Engineering Data Engineer NoSQL Engineering

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. Most importantly, these pipelines enable your team to transform data into actionable insights, demonstrating tangible business value.

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

Five Reasons for Migrating HBase Applications to the Cloudera Operational Database in the Public Cloud

Cloudera

SEPTEMBER 1, 2022

Think petabytes of data spread across trillions of rows, ready for consumption in real-time. In this blog, we’ll talk about Cloudera Operational Database (COD), a DBPaaS offering available on Cloudera Data Platform (CDP) that brings all the benefits of HBase without any of the overheads. COD in the Cloudera Data Platform (CDP).

Database

Database Cloud NoSQL SQL

Designing A Non-Relational Database Engine

Modern Customer Data Platform Principles

Webinars

Trending Sources

Expert Talk TLDR: SQL vs NoSQL Databases in the Modern Data Stack

Webinars

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data

Introduction to Databases in Data Science

RDBMS vs NoSQL: Key Differences and Similarities

Why Real-Time Analytics Requires Both the Flexibility of NoSQL and Strict Schemas of SQL Systems

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Big Data Technologies that Everyone Should Know in 2024

Data News — Week 23.42

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

HBase vs Cassandra-The Battle of the Best NoSQL Databases

Citus Data: Distributed PostGreSQL for Big Data with Ozgun Erdogan and Craig Kerstiens - Episode 13

CockroachDB In Depth with Peter Mattis - Episode 35

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

SQL and Complex Queries Are Needed for Real-Time Analytics

Case Study: Is Your NoSQL Data Hindering Real-Time Analytics? Savvy Solved It with Rockset.

How to Learn SQL Basics for Data Science in 2023?

Hadoop vs Spark: Main Big Data Tools Explained

Top 16 Data Science Job Roles To Pursue in 2024

Mythbusting: The Venerable SQL Database and Today’s Real-Time Analytics

SQL Alchemy Tutorial

Getting Started with Cloudera Data Platform Operational Database (COD)

Big Data Analytics: How It Works, Tools, and Real-Life Applications

How to Become a Data Engineer in 2024?

Powering SQL Draw with Rockset, Retool and dbt

Serverless Data Management: A SQL Search and Analytics Engine

Top 11 Programming Languages for Data Science

The Future of SQL: Databases Meet Stream Processing

SQL for Data Engineering: Success Blueprint for Data Engineers

Using Tableau with DynamoDB: How to Build a Real-Time SQL Dashboard on NoSQL Data

Data Scientist vs Data Engineer: Differences and Why You Need Both

From Big Data to Better Data: Ensuring Data Quality with Verity

Why SQL on Raw Data?

Use SurrealDB to Persist Data with Rocket REST API

Smart Schema: Enabling SQL Queries on Semi-Structured Data

Using CockroachDB to Reduce Feature Store Costs by 75%

Beyond Legacy Detection: How AI-Driven Data Governance Surpasses Traditional Methods

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44

From Schemaless Ingest to Smart Schema: Enabling SQL on Raw Data

Happy Birthday, CDP Public Cloud

Most important Data Engineering Concepts and Tools for Data Scientists

A Guide to Data Pipelines (And How to Design One From Scratch)

Five Reasons for Migrating HBase Applications to the Cloudera Operational Database in the Public Cloud

Stay Connected