Coding, Database and Database-centric - Data Engineering Digest

Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data

Towards Data Science

JANUARY 30, 2025

Building more efficient AI TLDR : Data-centric AI can create more efficient and accurate models. Full code and results available here onGitHub. Moving experiment configs to a YAML, automatically saving results to a file, and having o1 write my visualization code made life mucheasier. MNIST handwritten digit database.

Database-centric

Database-centric Datasets Data Architecture

Unlocking Operational Efficiency: A Major Home Improvement Retailer’s Path to Data Modernization with Striim

Striim

NOVEMBER 11, 2024

Known for its customer-centric approach and expansive product offerings, the company has maintained its leadership position in the industry for decades. After evaluating options, the retailer partnered with Striim to leverage its real-time data streaming and low-code/no-code integration capabilities.

Database-centric

Database-centric Retail Google Cloud PostgreSQL

Data Engineering Weekly #182

Data Engineering Weekly

JULY 28, 2024

I like testing people on their practical knowledge rather than artificial coding challenges. Adopting LLM in SQL-centric workflow is particularly interesting since companies increasingly try text-2-SQL to boost data usage. Log-as-the-Database (P2): Sending only write-ahead logs to the storage side upon transaction commit.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data Engineering Weekly #196

Data Engineering Weekly

NOVEMBER 3, 2024

The blog emphasizes the importance of starting with a clear client focus to avoid over-engineering and ensure user-centric development. impactdatasummit.com Thumbtack: What we learned building an ML infrastructure team at Thumbtack Thumbtack shares valuable insights from building its ML infrastructure team.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Every Company is Becoming a Software Company

Confluent

SEPTEMBER 25, 2019

Of course, this is not to imply that companies will become only software (there are still plenty of people in even the most software-centric companies), just that the full scope of the business is captured in an integrated software defined process. Apache Kafka ® and its uses.

Database-centric

Database-centric Kafka Pipeline-centric Retail

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

Bronze layers can also be the raw database tables. If you can modify or control the ingestion code, data quality tests, and validation checks should ideally be integrated directly into the process. Alternatively, suppose you do not control the ingestion code. Bronze layers should be immutable.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

Like data scientists, data engineers write code. There’s a multitude of reasons why complex pieces of software are not developed using drag and drop tools: it’s that ultimately code is the best abstraction there is for software. blobs: modern databases have a growing support for blobs through native types and functions.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

10 Lessons from 10 Years of Innovation and Engineering at Picnic

Picnic Engineering

FEBRUARY 13, 2025

A decade ago, Picnic set out to reinvent grocery shopping with a tech-first, customer-centric approach. For instance, we built self-service tools for all our engineers that allow them to handle tasks like environment setup, database management, or feature deployment effectively.

Engineering

Engineering Database-centric Generalist Java

The Future of Business Intelligence is Open Source

Maxime Beauchemin

MARCH 8, 2021

For those reasons, it is not surprising that it has taken over most of the modern data stack: infrastructure, databases, orchestration, data processing, AI/ML and beyond. That’s without mentioning the fact that for a cloud-native company, Tableau’s Windows-centric approach at the time didn’t work well for the team.

Business Intelligence

Business Intelligence BI Database-centric Google Cloud

Building a maintainable and modular LLM application stack with Hamilton

Towards Data Science

JULY 13, 2023

In this post, we’re going to share how Hamilton , an open source framework, can help you write modular and maintainable code for your large language model (LLM) application stack. The example we’ll walk you through will mirror a typical LLM application workflow you’d run to populate a vector database with some text knowledge.

Building

Building Database-centric Database Coding

CircleCI’s unnoticed holiday security breach

The Pragmatic Engineer

JANUARY 5, 2023

Our customers are some of the most innovative, engineering-centric businesses on the planet, and helping them do great work will continue to be our focus.” On that same day, the threat actor downloaded data from another database that stores pipeline-level config vars for Review Apps and Heroku CI.

Pipeline-centric

Pipeline-centric Database-centric Coding Accessibility

Why are database columns 191 characters?

Grouparoo

MAY 13, 2021

In this post, we’ll look at the historical reasons for the 191 character limit as a default in most relational databases. The first question you might ask is why limit the length of the strings you can store in a database at all? Why varchar and not text ? s fault 255 makes a lot more sense than 191. How did we get to 191?

Database

Database Bytes MySQL Database-centric

Data News — Week 23.14

Christophe Blefari

APRIL 8, 2023

At the same time Maxime Beauchemin wrote a post about Entity-Centric data modeling. Today, Microsoft announces new low-code capabilities for Power Query in order to do "data preparation" from multiple sources. I hope he will fill the gaps. In the first part he treats about the history of modeling and the main concepts.

Pipeline-centric

Pipeline-centric Database-centric Algorithm Data

Data News — Week 13.14

Christophe Blefari

APRIL 8, 2023

At the same time Maxime Beauchemin wrote a post about Entity-Centric data modeling. Today, Microsoft announces new low-code capabilities for Power Query in order to do "data preparation" from multiple sources. I hope he will fill the gaps. In the first part he treats about the history of modeling and the main concepts.

Pipeline-centric

Pipeline-centric Database-centric Algorithm Data

How to manage and schedule dbt

Christophe Blefari

DECEMBER 19, 2022

But this article is not about the pricing which can be very subjective depending on the context—what is 1200$ for dev tooling when you pay them more than $150k per year, yes it's US-centric but relevant. But before sending your code to production you still want to validate some stuff, static or not, in the CI/CD pipelines.

Management

Management Pipeline-centric Database-centric SQL

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

To illustrate that, let’s take Cloud SQL from the Google Cloud Platform that is a “Fully managed relational database service for MySQL, PostgreSQL, and SQL Server” It looks like this when you want to create an instance. You are starting to be an operation or technology centric data team.

Technology

Technology Architecture Google Cloud Metadata

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineers are skilled professionals who lay the foundation of databases and architecture. Using database tools, they create a robust architecture and later implement the process to develop the database from zero. Data engineers who focus on databases work with data warehouses and develop different table schemas.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

Data engineers who previously worked only with relational database management systems and SQL queries need training to take advantage of Hadoop. They have to know Java to go deep in Hadoop coding and effectively use features available via Java APIs. Spark SQL creates a communication layer between RDDs and relational databases.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

The Rise of Unstructured Data

Cloudera

NOVEMBER 15, 2021

Structured data can be defined as data that can be stored in relational databases, and unstructured data as everything else. Examples of unstructured data, on the other hand, include media (video, images, audio), text files (email, tweets), business productivity files (Microsoft Office documents, Github code repositories, etc.) .

Unstructured Data

Unstructured Data Pipeline-centric Database-centric Entertainment

A Guide to the Confluent Verified Integrations Program

Confluent

AUGUST 19, 2019

When it comes to writing a connector, there are two things you need to know how to do: how to write the code itself, and helping the world know about your new connector. This documentation is brand new and represents some of the most informative, developer-centric documentation on writing a connector to date.

Programming

Programming Kafka Database-centric MongoDB

Best Career Objective for Resume for Freshers with Sample

Knowledge Hut

NOVEMBER 15, 2023

Looking for a position to test my skills in implementing data-centric solutions for complicated business challenges. Sound knowledge of developing web portals, e-commerce applications, and code authoring. Seeking to provide coding and scripting competencies to the company's IT dept. An entry-level graduate with B.S.

Finance

Finance Certification Database-centric Business Intelligence

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Cloudera

SEPTEMBER 17, 2020

For a data engineer that has already built their Spark code on their laptop, we have made deployment of jobs one click away. Airflow allows defining pipelines using python code that are represented as entities called DAGs. Each DAG is defined using python code. Job Deployment Made Simple. Automation APIs.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

Top 10 Automation Testing Tools used in Software Industry

Knowledge Hut

SEPTEMBER 24, 2024

Ranorex Webtestit: A lightweight IDE optimized for building UI web tests with Selenium or Protractor It generates native Selenium and Protractor code in Java and Typescript respectively. Despite the technical coding knowledge and relevant experience, around 20% of professionals use this automation testing tool.

Java

Java Programming Language Pipeline-centric Database-centric

How RPR Provides Top-Notch Geocoding Data with Precisely

Precisely

APRIL 20, 2023

The National Association of REALTORS ® clearly understands this challenge, which is why it built RPR (Realtors Property Resource), the nation’s largest parcel-centric database, exclusively for REALTORS ®. Plus, things change – ZIP Codes are added, neighborhoods are constructed – so RPR is constantly looking to improve its match rates.

Database-centric

Database-centric Database Data Datasets

Data Engineer Roles And Responsibilities 2022

U-Next

AUGUST 17, 2022

SQL – A database may be used to build data warehousing, combine it with other technologies, and analyze the data for commercial reasons with the help of strong SQL abilities. Pipeline-centric: Pipeline-centric Data Engineers collaborate with data researchers to maximize the use of the info they gather.

Data Engineering

Data Engineering Data Engineer Database-centric Pipeline-centric

3 Use Cases for Generative AI Agents

DareData

MARCH 5, 2024

At DareData Engineering, we believe in a human-centric approach, where AI agents work together with humans to achieve faster and more efficient results. At its core, RAG harnesses the power of large language models and vector databases to augment pre-trained models (such as GPT 3.5 ).

Database-centric

Database-centric Telecommunication SQL Unstructured Data

Hexagonal Architecture: A Practical Guide

Booking.com Engineering

NOVEMBER 27, 2024

All you need to know for a quick start with Domain DrivenDesign Created using DALLE In todays fast-paced development environment, organising code effectively is critical for building scalable, maintainable, and testable applications. At its core, Hexagonal Architecture is a domain-centric approach.

Architecture

Architecture Database-centric Pipeline-centric Java

A Comprehensive Overview of Microsoft Fabric & Its Use Cases

RandomTrees

SEPTEMBER 27, 2024

With One Lake serving as a primary multi-cloud repository, Fabric is designed with an open, lake-centric architecture. Mirroring (a data replication capability) : Access and manage any database or warehouse from Fabric without switching database clients; Mirroring will be available for Azure Cosmos DB, Azure SQL DB, Snowflake, and Mongo DB.

Database-centric

Database-centric Pipeline-centric IT BI

Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences

Cloudera

JANUARY 17, 2021

New revenue stream through a persona-based database can be monetized through co-marketing efforts. Season Pass Holder Database . Demographic centric marketing. QR code app on smartphone. Rationalization of marketing and advertising spend producing the highest ROI. New Profit Steams – . Pricing Optimization –

Database-centric

Database-centric Manufacturing Retail Food

Rebuilding Netflix Video Processing Pipeline with Microservices

Netflix Tech

JANUARY 10, 2024

Monolithic structure : Since Reloaded modules were often co-located in the same repository, it was easy to overlook code-isolation rules and there was quite a bit of unintended reuse of code across what should have been strong boundaries. The results are saved to a database so they can be reused. 264, AV1, etc.).

Process

Process Pipeline-centric Media Metadata

Revolutionizing Build Analytics: How to enhance build processes with ThoughtSpot

ThoughtSpot

OCTOBER 18, 2024

In the fast-paced world of software development, the efficiency of build processes plays a crucial role in maintaining productivity and code quality. The parser used advanced regular expressions and parsing techniques to extract critical data, such as build duration, failure points, and related code changes.

Building

Building Process Pipeline-centric Database-centric

RDBMS vs NoSQL: Key Differences and Similarities

Knowledge Hut

MARCH 15, 2024

Making decisions in the database space requires deciding between RDBMS (Relational Database Management System) and NoSQL, each of which has unique features. Come with me on this adventure to learn the main differences and parallels between two well-known database solutions, i.e., RDBMS vs NoSQL. What is RDBMS? What is NoSQL?

NoSQL

NoSQL Database-centric Relational Database MongoDB

Top-Paying Data Engineer Jobs in Singapore [2023 Updated]

Knowledge Hut

FEBRUARY 27, 2023

In large organizations, data engineers concentrate on analytical databases, operate data warehouses that span multiple databases, and are responsible for developing table schemas. Data engineering builds data pipelines for core professionals like data scientists, consumers, and data-centric applications.

Data Engineering

Data Engineering Data Engineer Database-centric Pipeline-centric

Python for Data Engineering

Ascend.io

SEPTEMBER 14, 2023

Immediate Execution: Python code runs directly through the interpreter, eliminating the need for a separate compilation step. Platform Independence: With an interpreter for a specific platform, Python code can typically run without changes. It's specialized for database querying. Compiled, targeting the JVM.

Data Engineering

Data Engineering Data Engineer Python Engineering

20 Best Backend Development Tools In 2023

Knowledge Hut

JULY 26, 2023

These backend tools cover a wide range of features, such as deployment utilities, frameworks, libraries, and databases. Better Data Management: Database management solutions offered by backend tools enable developers to quickly store, retrieve, and alter data.

Database-centric

Database-centric Programming Language Pipeline-centric Utilities

Kickstart Your 2023 with these 6 Articles – The Meltano Teams Favorite Data Articles of 2022

Meltano

JANUARY 25, 2023

He compared the SQL + Jinja approach to the early PHP era… […] “If you take the dataframe-centric approach, you have much more “proper” objects, and programmatic abstractions and semantics around datasets, columns, and transformations.

Pipeline-centric

Pipeline-centric Database-centric SQL Data Warehouse

NoSQL vs SQL- 4 Reasons Why NoSQL is better for Big Data applications

ProjectPro

MARCH 19, 2015

Big Data NoSQL databases were pioneered by top internet companies like Amazon, Google, LinkedIn and Facebook to overcome the drawbacks of RDBMS. There is a need for a database technology that can render 24/7 support to store, process and analyze this data. Table of Contents Can the conventional SQL scale up to these requirements?

NoSQL

NoSQL Big Data SQL Database-centric

Unlocking the Power of Geospatial Data for Insights

Snowflake

JANUARY 15, 2025

Over the last three geospatial-centric blog posts, weve covered the basics of what geospatial data is, how it works in the broader world of data and how it specifically works in Snowflake based on our native support for GEOGRAPHY , GEOMETRY and H3. Lets dig into one way that you can use that geocoded data.

Transportation

Transportation BI Database-centric Metadata

What is Azure Data Factory – Here’s Everything You Need to Know

Edureka

JULY 3, 2024

Code-free Data Flow Mapping Data Flows in Azure Data Factory allows non-developers to build complex data transformations, plus clean, filter, and manipulate the data on the fly without writing a single line of code. For online sources, ADF offers numerous built-in connectors for APIs, cloud services, and databases.

Pipeline-centric

Pipeline-centric Data Lake Database-centric Data Pipeline

Industry Interview Series-Big Data in Healthcare

ProjectPro

JUNE 1, 2015

Get FREE Access to Data Analytics Example Codes for Data Cleaning, Data Munging, and Data Visualization We have come a long way, but have we been able to harness the full power of Big Data analytics in healthcare ? Big Trends in Healthcare Industry 50 years back healthcare services were mostly physician centric.

Healthcare

Healthcare Big Data Database-centric Hospitality

Node js vs JavaScript: Node Js Pros and Cons

Knowledge Hut

APRIL 24, 2024

typically represents several objects and functions accessible to JavaScript code. JavaScript code can now execute outside of the browser, thanks to Node.js. The API frequently changes, which causes challenges for developers because they'll have to make adjustments to their existing code base to stay compatible. What is Node.js?

Programming Language

Programming Language Database-centric Programming Python

97 things every data engineer should know

Grouparoo

OCTOBER 6, 2021

42 Learn to Use a NoSQL Database, but Not like an RDBMS Write answers to questions in NoSQL databases for fast access 43 Let the Robots Enforce the Rules Work with people to standardize and use code to enforce rules 44 Listen to Your Users—but Not Too Much Create a data team vision and strategy. What does that do?

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Periodic Table of DevOps Tools: Complete Table

Knowledge Hut

FEBRUARY 6, 2024

Around 2007, the software development and IT operations groups expressed concerns about the conventional software development approach, in which developers wrote code separately from operations, who deployed and supported the code. Database Management Most enterprise apps still rely heavily on databases to function.

Pipeline-centric

Pipeline-centric Database-centric AWS Manufacturing

Azure Data Engineer vs Azure DevOps: Top 8 Differences

Knowledge Hut

NOVEMBER 2, 2023

Tools and Technologies Azure Data Factory, Azure Databricks, Azure SQL Database, Azure Cosmos DB, Power BI. Their responsibilities involve configuring and managing Continuous Integration/ Continuous Deployment (CI/CD) pipelines, implementing Infrastructure as Code (IaC), source control management.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data

Unlocking Operational Efficiency: A Major Home Improvement Retailer’s Path to Data Modernization with Striim

Webinars

Trending Sources

Data Engineering Weekly #182

Webinars

Data Engineering Weekly #196

Every Company is Becoming a Software Company

The Race For Data Quality in a Medallion Architecture

The Rise of the Data Engineer

10 Lessons from 10 Years of Innovation and Engineering at Picnic

The Future of Business Intelligence is Open Source

Building a maintainable and modular LLM application stack with Hamilton

CircleCI’s unnoticed holiday security breach

Why are database columns 191 characters?

Data News — Week 23.14

Data News — Week 13.14

How to manage and schedule dbt

Toward a Data Mesh (part 2) : Architecture & Technologies

How to Become a Data Engineer in 2024?

Hadoop vs Spark: Main Big Data Tools Explained

The Rise of Unstructured Data

A Guide to the Confluent Verified Integrations Program

Best Career Objective for Resume for Freshers with Sample

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Top 10 Automation Testing Tools used in Software Industry

How RPR Provides Top-Notch Geocoding Data with Precisely

Data Engineer Roles And Responsibilities 2022

3 Use Cases for Generative AI Agents

Hexagonal Architecture: A Practical Guide

A Comprehensive Overview of Microsoft Fabric & Its Use Cases

Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences

Rebuilding Netflix Video Processing Pipeline with Microservices

Revolutionizing Build Analytics: How to enhance build processes with ThoughtSpot

RDBMS vs NoSQL: Key Differences and Similarities

Top-Paying Data Engineer Jobs in Singapore [2023 Updated]

Python for Data Engineering

20 Best Backend Development Tools In 2023

Kickstart Your 2023 with these 6 Articles – The Meltano Teams Favorite Data Articles of 2022

NoSQL vs SQL- 4 Reasons Why NoSQL is better for Big Data applications

Unlocking the Power of Geospatial Data for Insights

What is Azure Data Factory – Here’s Everything You Need to Know

Industry Interview Series-Big Data in Healthcare

Node js vs JavaScript: Node Js Pros and Cons

97 things every data engineer should know

Periodic Table of DevOps Tools: Complete Table

Azure Data Engineer vs Azure DevOps: Top 8 Differences

Stay Connected