Data, Database and SQL - Data Engineering Digest

How to Normalize Relational Databases With SQL Code?

Analytics Vidhya

FEBRUARY 27, 2023

Introduction Data is the new oil in this century. The database is the major element of a data science project. To generate actionable insights, the database must be centralized and organized efficiently. So, we are […] The post How to Normalize Relational Databases With SQL Code?

Relational Database

Relational Database Database SQL Coding

SQL Injection: The Cyber Attack Hiding in Your Database

Analytics Vidhya

FEBRUARY 2, 2023

Introduction SQL injection is an attack in which a malicious user can insert arbitrary SQL code into a web application’s query, allowing them to gain unauthorized access to a database. We can use this to steal sensitive information or make unauthorized changes to the data stored in the database.

Database

Database SQL Coding Accessibility

MSSQL vs MySQL: Comparing Powerhouses of Databases

Analytics Vidhya

AUGUST 30, 2023

Introduction In the bustling arena of database management systems, two heavyweight contenders emerge, each carrying its arsenal of features and capabilities. In one corner, we have the suave and sophisticated Microsoft SQL Server (MSSQL), donned in the elegance of enterprise-level prowess.

MySQL

MySQL Database SQL Systems

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Step-by-Step Roadmap to Learn SQL in 2023

Analytics Vidhya

FEBRUARY 28, 2023

Introduction Structured Query Language is a powerful language to manage and manipulate data stored in databases. SQL is widely used in the field of data science and is considered an essential skill to have if you work with data.

SQL

SQL Relational Database Data Science Database

Mirroring SQL Server Database to Microsoft Fabric

Striim

NOVEMBER 19, 2024

SQL2Fabric Mirroring is a new fully managed service offered by Striim to mirror on premise SQL Databases. It’s a collaborative service between Striim and Microsoft based on Fabric Open Mirroring that enables real-time data replication from on-premise SQL Server databases to Azure Fabric OneLake.

SQL

SQL Database Data Warehouse Data Pipeline

Understanding the Basics of Database Normalization

Analytics Vidhya

MARCH 2, 2023

Introduction Data normalization is the process of building a database according to what is known as a canonical form, where the final product is a relational database with no data redundancy. More specifically, normalization involves organizing data according to attributes assigned as part of a larger data model.

Database

Database Relational Database Building Process

5 Free University Courses to Learn Databases and SQL

KDnuggets

MARCH 5, 2024

Looking to learn SQL and databases to level up your data science skills? Learn SQL, database internals, and much more with these free university courses.

SQL

SQL Database Data Science Data

Surveying The Market Of Database Products

Data Engineering Podcast

OCTOBER 29, 2023

Summary Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. With Materialize, you can! Go to dataengineeringpodcast.com/materialize today to get 2 weeks free!

Database

Database SQL BI Machine Learning

Back to Basics Week 2: Database, SQL, Data Management and Statistical Concepts

KDnuggets

NOVEMBER 13, 2023

This week, we delve into the vital world of Databases, SQL, Data Management, and Statistical Concepts in Data Science. Welcome back to Week 2 of KDnuggets’ "Back to Basics" series.

Database

Database SQL Data Management Management

7 Modern SQL Database you Must Know in 2024

KDnuggets

JUNE 28, 2024

Explore the world of modern databases that are fast, secure, and cost-efficient, designed to tackle large-scale and diverse data challenges.

Database

Database SQL Designing Data

Data News — Week 25.02

Christophe Blefari

JANUARY 11, 2025

The Data News are here to stay, the format might vary during the year, but here we are for another year. We published videos about the Forward Data Conference, you can watch Hannes, DuckDB co-creator, keynote about Changing Large Tables. HNY 2025 ( credits ) Happy new year ✨ I wish you the best for 2025. Not really digest.

Data

Data Data Warehouse Coding Programming Language

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. Want to see Starburst in action?

SQL

SQL Data Lake High Quality Data Machine Learning

KDnuggets News, September 13: Getting Started with SQL in 5 Steps • Introduction to Databases in Data Science

KDnuggets

SEPTEMBER 13, 2023

Getting Started with SQL in 5 Steps • Introduction to Databases in Data Science • Time 100 AI: The Most Influential?

Data Science

Data Science Database SQL Data

The Three Levels of SQL Comprehension: What they are and why you need to know about them

dbt Developer Hub

JANUARY 22, 2025

The main thing I knew going in was "SDF understands SQL". For the next era of Analytics Engineering to be as transformative as the last, dbt needs to move beyond being a string preprocessor and into fully comprehending SQL. Today we're going to dig into what SQL comprehension actually means, since it's so critical to what comes next.

SQL

SQL Database Coding Technology

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

Data lineage is an instrumental part of Metas Privacy Aware Infrastructure (PAI) initiative, a suite of technologies that efficiently protect user privacy. It is a critical and powerful tool for scalable discovery of relevant data and data flows, which supports privacy controls across Metas systems.

Data Warehouse

Data Warehouse SQL Programming Language Data

Using SQL with Python: SQLAlchemy and Pandas

KDnuggets

JUNE 12, 2024

A simple tutorial on how to connect to databases, execute SQL queries, and analyze and visualize data.

SQL

SQL Python Database Data

Why SQL is THE Language to Learn for Data Science

KDnuggets

OCTOBER 12, 2023

SQL is the essential data science language due to its universal database accessibility, efficient data cleaning capabilities, seamless integration with other languages, and requirement for most data science jobs.

Data Science

Data Science SQL Database Data

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication.

Non-relational Database

Non-relational Database Relational Database Database Designing

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

With yato you give a folder with SQL queries and it guesses the DAG and runs the queries in the right order. Saying mainly that " Sora is a tool to extend creativity " Last point Mira has been mocked and criticised online because as a CTO she wasn't able to say on which public / licensed data Sora has been trained on.

Metadata

Metadata Data Data Warehouse Software Engineering

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data

Analytics Vidhya

FEBRUARY 22, 2023

Introduction Data replication is also known as database replication, which is copying data to ensure that all information remains consistent across all data resources in real-time. data replication is like a safety net that keeps your information safe from disappearing or falling through the cracks.

Database

Database Data NoSQL Datasets

Introduction to Databases in Data Science

KDnuggets

SEPTEMBER 8, 2023

Understand the relevance of databases in data science. Also learn the fundamentals of relational databases, NoSQL database categories, and more.

Database

Database Data Science NoSQL Relational Database

An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem

Data Engineering Podcast

SEPTEMBER 10, 2023

Summary Data systems are inherently complex and often require integration of multiple technologies. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. With Materialize, you can!

BI

BI SQL Machine Learning Data

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Summary Building a database engine requires a substantial amount of engineering effort and time investment. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database. Data lakes are notoriously complex.

Database

Database Technology Data Lake High Quality Data

A Beginner’s Guide to ClickHouse Database

KDnuggets

SEPTEMBER 13, 2024

Learn how to install ClickHouse DBMS, create a database, and run SQL queries using native and Python clients.

Database

Database SQL Python Data Engineering

Building An Internal Database As A Service Platform At Cloudflare

Data Engineering Podcast

AUGUST 27, 2023

Summary Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Database

Database Building PostgreSQL BI

Stop Overcomplicating Data Quality

Towards Data Science

DECEMBER 10, 2024

Three Zero-Cost Solutions That Take Hours, NotMonths A data quality certified pipeline. Source: unsplash.com In my career, data quality initiatives have usually meant big changes. Whats more, fixing the data quality issues this way often leads to new problems. Create a custom dashboard for your specific data qualityproblem.

PostgreSQL

PostgreSQL Data Python SQL

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

The current database includes 2,000 server types in 130 regions and 340 zones. Storing data: data collected is stored to allow for historical comparisons. Results are stored in git and their database, together with benchmarking metadata. Visualizing the data: the frontend that allows querying of live and historic data.

Cloud

Cloud AWS Metadata Cloud Computing

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Data lakes are notoriously complex.

Kafka

Kafka Data Lake High Quality Data SQL

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Many of our customers — from Marriott to AT&T — start their journey with the Snowflake AI Data Cloud by migrating their data warehousing workloads to the platform. Today we’re focusing on customers who migrated from a cloud data warehouse to Snowflake and some of the benefits they saw. million in cost savings annually.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

Getting Started with Graph Database Queries, with Cheat Sheet!

KDnuggets

NOVEMBER 6, 2023

Graph databases are quickly becoming a core part of the analytics toolset for enterprise IT organizations. If you know SQL, you can easily learn Cypher and open up a huge opportunity for data analysis.

Database

Database SQL Data Analysis Data

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. With Materialize, you can! Go to dataengineeringpodcast.com/materialize today to get 2 weeks free!

Building

Building SQL BI Python

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

It’s easy these days for an organization’s data infrastructure to begin looking like a maze, with an accumulation of point solutions here and there. Snowflake is committed to doing just that by continually adding features to help our customers simplify how they architect their data infrastructure. Here’s a closer look.

Data Architecture

Data Architecture Architecture Data Lake Kafka

dbt multi-project collaboration

Christophe Blefari

OCTOBER 19, 2023

cross-project dependencies ( credits ) Over the last few years, dbt has become a de facto standard enabling companies to collaborate easily on data transformations. With dbt, you can apply software engineering practices to SQL development. Managing your SQL patrimony has never been easier. See the doc. You can try it yourself.

Project

Project Finance SQL Government

How to get started with dbt

Christophe Blefari

MARCH 1, 2023

dbt Core is an open-source framework that helps you organise data warehouse SQL transformation. dbt was born out of the analysis that more and more companies were switching from on-premise Hadoop data infrastructure to cloud data warehouses. This switch has been lead by modern data stack vision. Enter the ELT.

Data Warehouse

Data Warehouse SQL Metadata Raw Data

Accelerate AI Development with Snowflake

Snowflake

NOVEMBER 11, 2024

At Snowflake BUILD , we are introducing powerful new features designed to accelerate building and deploying generative AI applications on enterprise data, while helping you ensure trust and safety. These scalable models can handle millions of records, enabling you to efficiently build high-performing NLP data pipelines.

Unstructured Data

Unstructured Data SQL AWS Healthcare

Defining A Strategy For Your Data Products

Data Engineering Podcast

OCTOBER 22, 2023

Summary The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. With Materialize, you can!

BI

BI SQL Machine Learning Programming Language

Using Data To Illuminate The Intentionally Opaque Insurance Industry

Data Engineering Podcast

OCTOBER 8, 2023

In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. With Materialize, you can!

Insurance

Insurance BI SQL Machine Learning

OLTP Vs OLAP – What Is The Difference

Seattle Data Guy

MAY 8, 2023

Adding databases like MongoDB and CassandraDB only makes matters worse, since they’re not SQL-friendly – the language most analysts and data practitioners are used to.… … Read more The post OLTP Vs OLAP – What Is The Difference appeared first on Seattle Data Guy.

MongoDB

MongoDB SQL Database Designing

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Data Engineering Podcast

SEPTEMBER 3, 2023

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Data Integration

Data Integration BI SQL Python

Data Engineering Weekly #218

Data Engineering Weekly

APRIL 27, 2025

Before Hoptimator, Pinot ingestion often required data producers to create and manage separate, Pinot-specific preprocessing jobs to optimize data, such as re-keying, filtering, and pre-aggregating. reducing user friction, operator toil, and resource consumption on Pinot servers, while automating pipeline management.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

How to Normalize Relational Databases With SQL Code?

SQL Injection: The Cyber Attack Hiding in Your Database

Webinars

Trending Sources

MSSQL vs MySQL: Comparing Powerhouses of Databases

Webinars

Step-by-Step Roadmap to Learn SQL in 2023

Top 5 SQL Interview Questions With Implementation

Mirroring SQL Server Database to Microsoft Fabric

Understanding the Basics of Database Normalization

5 Free University Courses to Learn Databases and SQL

Surveying The Market Of Database Products

Top 5 SQL Interview Questions

Back to Basics Week 2: Database, SQL, Data Management and Statistical Concepts

7 Modern SQL Database you Must Know in 2024

Data News — Week 25.02

Tackling Real Time Streaming Data With SQL Using RisingWave

KDnuggets News, September 13: Getting Started with SQL in 5 Steps • Introduction to Databases in Data Science

The Three Levels of SQL Comprehension: What they are and why you need to know about them

Reconciling The Data In Your Databases With Datafold

How Meta discovers data flows via lineage at scale

Using SQL with Python: SQLAlchemy and Pandas

Why SQL is THE Language to Learn for Data Science

Designing A Non-Relational Database Engine

Data News — Week 24.11

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data

Introduction to Databases in Data Science

An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

A Beginner’s Guide to ClickHouse Database

Building An Internal Database As A Service Platform At Cloudflare

Stop Overcomplicating Data Quality

Interesting startup idea: benchmarking cloud platform pricing

Troubleshooting Kafka In Production

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Getting Started with Graph Database Queries, with Cheat Sheet!

Building Linked Data Products With JSON-LD

Simplifying Data Architecture and Security to Accelerate Value

Top 8 Interview Questions on Apache Sqoop

dbt multi-project collaboration

How to get started with dbt

Accelerate AI Development with Snowflake

Defining A Strategy For Your Data Products

Using Data To Illuminate The Intentionally Opaque Insurance Industry

OLTP Vs OLAP – What Is The Difference

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Data Engineering Weekly #218

Stay Connected