Coding and Database-centric - Data Engineering Digest

Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data

Towards Data Science

JANUARY 30, 2025

Building more efficient AI TLDR : Data-centric AI can create more efficient and accurate models. Full code and results available here onGitHub. Moving experiment configs to a YAML, automatically saving results to a file, and having o1 write my visualization code made life mucheasier. MNIST handwritten digit database.

Database-centric

Database-centric Datasets Data Architecture

Data Engineering Weekly #196

Data Engineering Weekly

NOVEMBER 3, 2024

The blog emphasizes the importance of starting with a clear client focus to avoid over-engineering and ensure user-centric development. impactdatasummit.com Thumbtack: What we learned building an ML infrastructure team at Thumbtack Thumbtack shares valuable insights from building its ML infrastructure team.

Data Engineer

Data Engineer Data Engineering Engineering Pipeline-centric

Unlocking Operational Efficiency: A Major Home Improvement Retailer’s Path to Data Modernization with Striim

Striim

NOVEMBER 11, 2024

Known for its customer-centric approach and expansive product offerings, the company has maintained its leadership position in the industry for decades. After evaluating options, the retailer partnered with Striim to leverage its real-time data streaming and low-code/no-code integration capabilities.

Retail

Retail Database-centric Google Cloud PostgreSQL

Data Engineering Weekly #182

Data Engineering Weekly

JULY 28, 2024

I like testing people on their practical knowledge rather than artificial coding challenges. Adopting LLM in SQL-centric workflow is particularly interesting since companies increasingly try text-2-SQL to boost data usage. Log-as-the-Database (P2): Sending only write-ahead logs to the storage side upon transaction commit.

Data Engineer

Data Engineer Data Engineering Engineering Database-centric

10 Lessons from 10 Years of Innovation and Engineering at Picnic

Picnic Engineering

FEBRUARY 13, 2025

A decade ago, Picnic set out to reinvent grocery shopping with a tech-first, customer-centric approach. For instance, we built self-service tools for all our engineers that allow them to handle tasks like environment setup, database management, or feature deployment effectively.

Engineering

Engineering Database-centric Generalist Java

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

Bronze layers can also be the raw database tables. If you can modify or control the ingestion code, data quality tests, and validation checks should ideally be integrated directly into the process. Alternatively, suppose you do not control the ingestion code. Bronze layers should be immutable.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

Every Company is Becoming a Software Company

Confluent

SEPTEMBER 25, 2019

Of course, this is not to imply that companies will become only software (there are still plenty of people in even the most software-centric companies), just that the full scope of the business is captured in an integrated software defined process. Apache Kafka ® and its uses.

Database-centric

Database-centric Kafka Pipeline-centric Retail

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

Like data scientists, data engineers write code. There’s a multitude of reasons why complex pieces of software are not developed using drag and drop tools: it’s that ultimately code is the best abstraction there is for software. blobs: modern databases have a growing support for blobs through native types and functions.

Data Engineer

Data Engineer Data Engineering Engineering ETL Tools

The Future of Business Intelligence is Open Source

Maxime Beauchemin

MARCH 8, 2021

For those reasons, it is not surprising that it has taken over most of the modern data stack: infrastructure, databases, orchestration, data processing, AI/ML and beyond. That’s without mentioning the fact that for a cloud-native company, Tableau’s Windows-centric approach at the time didn’t work well for the team.

Business Intelligence

Business Intelligence BI Database-centric Google Cloud

CircleCI’s unnoticed holiday security breach

The Pragmatic Engineer

JANUARY 5, 2023

Our customers are some of the most innovative, engineering-centric businesses on the planet, and helping them do great work will continue to be our focus.” On that same day, the threat actor downloaded data from another database that stores pipeline-level config vars for Review Apps and Heroku CI.

Pipeline-centric

Pipeline-centric Database-centric Coding Accessibility

Building a maintainable and modular LLM application stack with Hamilton

Towards Data Science

JULY 13, 2023

In this post, we’re going to share how Hamilton , an open source framework, can help you write modular and maintainable code for your large language model (LLM) application stack. The example we’ll walk you through will mirror a typical LLM application workflow you’d run to populate a vector database with some text knowledge.

Building

Building Database-centric Database Coding

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

To illustrate that, let’s take Cloud SQL from the Google Cloud Platform that is a “Fully managed relational database service for MySQL, PostgreSQL, and SQL Server” It looks like this when you want to create an instance. You are starting to be an operation or technology centric data team.

Technology

Technology Architecture Google Cloud Metadata

How to manage and schedule dbt

Christophe Blefari

DECEMBER 19, 2022

But this article is not about the pricing which can be very subjective depending on the context—what is 1200$ for dev tooling when you pay them more than $150k per year, yes it's US-centric but relevant. But before sending your code to production you still want to validate some stuff, static or not, in the CI/CD pipelines.

Management

Management Pipeline-centric Database-centric SQL

Data News — Week 23.14

Christophe Blefari

APRIL 8, 2023

At the same time Maxime Beauchemin wrote a post about Entity-Centric data modeling. Today, Microsoft announces new low-code capabilities for Power Query in order to do "data preparation" from multiple sources. I hope he will fill the gaps. In the first part he treats about the history of modeling and the main concepts.

Pipeline-centric

Pipeline-centric Database-centric Algorithm Data

Data News — Week 13.14

Christophe Blefari

APRIL 8, 2023

At the same time Maxime Beauchemin wrote a post about Entity-Centric data modeling. Today, Microsoft announces new low-code capabilities for Power Query in order to do "data preparation" from multiple sources. I hope he will fill the gaps. In the first part he treats about the history of modeling and the main concepts.

Pipeline-centric

Pipeline-centric Database-centric Algorithm Data

The Rise of Unstructured Data

Cloudera

NOVEMBER 15, 2021

Structured data can be defined as data that can be stored in relational databases, and unstructured data as everything else. Examples of unstructured data, on the other hand, include media (video, images, audio), text files (email, tweets), business productivity files (Microsoft Office documents, Github code repositories, etc.) .

Unstructured Data

Unstructured Data Pipeline-centric Database-centric Entertainment

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineers are skilled professionals who lay the foundation of databases and architecture. Using database tools, they create a robust architecture and later implement the process to develop the database from zero. Data engineers who focus on databases work with data warehouses and develop different table schemas.

Data Engineer

Data Engineer Data Engineering Engineering Hadoop

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Cloudera

SEPTEMBER 17, 2020

For a data engineer that has already built their Spark code on their laptop, we have made deployment of jobs one click away. Airflow allows defining pipelines using python code that are represented as entities called DAGs. Each DAG is defined using python code. Job Deployment Made Simple. Automation APIs.

Data Pipeline

Data Pipeline Data Engineer Data Engineering Engineering

Top 10 Automation Testing Tools used in Software Industry

Knowledge Hut

SEPTEMBER 24, 2024

Ranorex Webtestit: A lightweight IDE optimized for building UI web tests with Selenium or Protractor It generates native Selenium and Protractor code in Java and Typescript respectively. Despite the technical coding knowledge and relevant experience, around 20% of professionals use this automation testing tool.

Java

Java Programming Language Pipeline-centric Database-centric

A Guide to the Confluent Verified Integrations Program

Confluent

AUGUST 19, 2019

When it comes to writing a connector, there are two things you need to know how to do: how to write the code itself, and helping the world know about your new connector. This documentation is brand new and represents some of the most informative, developer-centric documentation on writing a connector to date.

Programming

Programming Kafka Database-centric MongoDB

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

Data engineers who previously worked only with relational database management systems and SQL queries need training to take advantage of Hadoop. They have to know Java to go deep in Hadoop coding and effectively use features available via Java APIs. Spark SQL creates a communication layer between RDDs and relational databases.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

3 Use Cases for Generative AI Agents

DareData

MARCH 5, 2024

At DareData Engineering, we believe in a human-centric approach, where AI agents work together with humans to achieve faster and more efficient results. At its core, RAG harnesses the power of large language models and vector databases to augment pre-trained models (such as GPT 3.5 ).

Database-centric

Database-centric Telecommunication SQL Unstructured Data

How RPR Provides Top-Notch Geocoding Data with Precisely

Precisely

APRIL 20, 2023

The National Association of REALTORS ® clearly understands this challenge, which is why it built RPR (Realtors Property Resource), the nation’s largest parcel-centric database, exclusively for REALTORS ®. Plus, things change – ZIP Codes are added, neighborhoods are constructed – so RPR is constantly looking to improve its match rates.

Database-centric

Database-centric Database Data Datasets

Data Engineer Roles And Responsibilities 2022

U-Next

AUGUST 17, 2022

SQL – A database may be used to build data warehousing, combine it with other technologies, and analyze the data for commercial reasons with the help of strong SQL abilities. Pipeline-centric: Pipeline-centric Data Engineers collaborate with data researchers to maximize the use of the info they gather.

Data Engineer

Data Engineer Data Engineering Database-centric Pipeline-centric

Hexagonal Architecture: A Practical Guide

Booking.com Engineering

NOVEMBER 27, 2024

All you need to know for a quick start with Domain DrivenDesign Created using DALLE In todays fast-paced development environment, organising code effectively is critical for building scalable, maintainable, and testable applications. At its core, Hexagonal Architecture is a domain-centric approach.

Architecture

Architecture Database-centric Pipeline-centric Java

Rebuilding Netflix Video Processing Pipeline with Microservices

Netflix Tech

JANUARY 10, 2024

Monolithic structure : Since Reloaded modules were often co-located in the same repository, it was easy to overlook code-isolation rules and there was quite a bit of unintended reuse of code across what should have been strong boundaries. The results are saved to a database so they can be reused. 264, AV1, etc.).

Process

Process Pipeline-centric Media Metadata

A Comprehensive Overview of Microsoft Fabric & Its Use Cases

RandomTrees

SEPTEMBER 27, 2024

With One Lake serving as a primary multi-cloud repository, Fabric is designed with an open, lake-centric architecture. Mirroring (a data replication capability) : Access and manage any database or warehouse from Fabric without switching database clients; Mirroring will be available for Azure Cosmos DB, Azure SQL DB, Snowflake, and Mongo DB.

Database-centric

Database-centric Pipeline-centric IT BI

Best Career Objective for Resume for Freshers with Sample

Knowledge Hut

NOVEMBER 15, 2023

Looking for a position to test my skills in implementing data-centric solutions for complicated business challenges. Sound knowledge of developing web portals, e-commerce applications, and code authoring. Seeking to provide coding and scripting competencies to the company's IT dept. An entry-level graduate with B.S.

Finance

Finance Certification Database-centric Business Intelligence

Revolutionizing Build Analytics: How to enhance build processes with ThoughtSpot

ThoughtSpot

OCTOBER 18, 2024

In the fast-paced world of software development, the efficiency of build processes plays a crucial role in maintaining productivity and code quality. The parser used advanced regular expressions and parsing techniques to extract critical data, such as build duration, failure points, and related code changes.

Building

Building Process Pipeline-centric Database-centric

Node js vs JavaScript: Node Js Pros and Cons

Knowledge Hut

APRIL 24, 2024

typically represents several objects and functions accessible to JavaScript code. JavaScript code can now execute outside of the browser, thanks to Node.js. The API frequently changes, which causes challenges for developers because they'll have to make adjustments to their existing code base to stay compatible. What is Node.js?

Programming Language

Programming Language Database-centric Programming Python

Kickstart Your 2023 with these 6 Articles – The Meltano Teams Favorite Data Articles of 2022

Meltano

JANUARY 25, 2023

He compared the SQL + Jinja approach to the early PHP era… […] “If you take the dataframe-centric approach, you have much more “proper” objects, and programmatic abstractions and semantics around datasets, columns, and transformations.

Pipeline-centric

Pipeline-centric Database-centric SQL Data Warehouse

Top-Paying Data Engineer Jobs in Singapore [2023 Updated]

Knowledge Hut

FEBRUARY 27, 2023

In large organizations, data engineers concentrate on analytical databases, operate data warehouses that span multiple databases, and are responsible for developing table schemas. Data engineering builds data pipelines for core professionals like data scientists, consumers, and data-centric applications.

Data Engineer

Data Engineer Data Engineering Database-centric Pipeline-centric

Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences

Cloudera

JANUARY 17, 2021

New revenue stream through a persona-based database can be monetized through co-marketing efforts. Season Pass Holder Database . Demographic centric marketing. QR code app on smartphone. Rationalization of marketing and advertising spend producing the highest ROI. New Profit Steams – . Pricing Optimization –

Database-centric

Database-centric Manufacturing Retail Food

Unlocking the Power of Geospatial Data for Insights

Snowflake

JANUARY 15, 2025

Over the last three geospatial-centric blog posts, weve covered the basics of what geospatial data is, how it works in the broader world of data and how it specifically works in Snowflake based on our native support for GEOGRAPHY , GEOMETRY and H3. Lets dig into one way that you can use that geocoded data.

Transportation

Transportation BI Database-centric Metadata

Industry Interview Series-Big Data in Healthcare

ProjectPro

JUNE 1, 2015

Get FREE Access to Data Analytics Example Codes for Data Cleaning, Data Munging, and Data Visualization We have come a long way, but have we been able to harness the full power of Big Data analytics in healthcare ? Big Trends in Healthcare Industry 50 years back healthcare services were mostly physician centric.

Healthcare

Healthcare Big Data Database-centric Hospitality

Python for Data Engineering

Ascend.io

SEPTEMBER 14, 2023

Immediate Execution: Python code runs directly through the interpreter, eliminating the need for a separate compilation step. Platform Independence: With an interpreter for a specific platform, Python code can typically run without changes. It's specialized for database querying. Compiled, targeting the JVM.

Data Engineer

Data Engineer Data Engineering Python Engineering

Why are database columns 191 characters?

Grouparoo

MAY 13, 2021

In this post, we’ll look at the historical reasons for the 191 character limit as a default in most relational databases. The first question you might ask is why limit the length of the strings you can store in a database at all? Why varchar and not text ? s fault 255 makes a lot more sense than 191. How did we get to 191?

Database

Database Bytes MySQL Database-centric

20 Best Backend Development Tools In 2023

Knowledge Hut

JULY 26, 2023

These backend tools cover a wide range of features, such as deployment utilities, frameworks, libraries, and databases. Better Data Management: Database management solutions offered by backend tools enable developers to quickly store, retrieve, and alter data.

Database-centric

Database-centric Programming Language Pipeline-centric Utilities

RDBMS vs NoSQL: Key Differences and Similarities

Knowledge Hut

MARCH 15, 2024

Making decisions in the database space requires deciding between RDBMS (Relational Database Management System) and NoSQL, each of which has unique features. Come with me on this adventure to learn the main differences and parallels between two well-known database solutions, i.e., RDBMS vs NoSQL. What is RDBMS? What is NoSQL?

NoSQL

NoSQL Database-centric Relational Database PostgreSQL

What is Azure Data Factory – Here’s Everything You Need to Know

Edureka

JULY 3, 2024

Code-free Data Flow Mapping Data Flows in Azure Data Factory allows non-developers to build complex data transformations, plus clean, filter, and manipulate the data on the fly without writing a single line of code. For online sources, ADF offers numerous built-in connectors for APIs, cloud services, and databases.

Pipeline-centric

Pipeline-centric Data Lake Database-centric Data Pipeline

Azure Synapse vs Databricks: 2023 Comparison Guide

Knowledge Hut

SEPTEMBER 26, 2023

It offers a wide range of services, including computing, storage, databases, machine learning, and analytics, making it a versatile choice for businesses looking to harness the power of the cloud. This cloud-centric approach ensures scalability, flexibility, and cost-efficiency for your data workloads.

Data Lake

Data Lake Database-centric Machine Learning Pipeline-centric

Data Quality: The Missing Link in Your Cloud Data Migration

phData: Data Engineering

DECEMBER 20, 2022

Data modernization is an umbrella term for the many ways businesses upgrade their data infrastructure, typically with cloud-centric solutions like the Snowflake Data Cloud. The cloud also democratizes access to data, whereas on-premises databases tend to restrict access and create silos.

Cloud

Cloud Database-centric Data Data Warehouse

Data Pipelines in the Healthcare Industry

DareData

JULY 29, 2020

One paper suggests that there is a need for a re-orientation of the healthcare industry to be more "patient-centric". Furthermore, clean and accessible data, along with data driven automations, can assist medical professionals in taking this patient-centric approach by freeing them from some time-consuming processes.

Data Pipeline

Data Pipeline Healthcare Medical Pipeline-centric

The Exact GitHub Pull Request Template We Use at dbt Labs

dbt Developer Hub

NOVEMBER 28, 2021

Having a GitHub pull request template is one of the most important and frequently overlooked aspects of creating an efficient and scalable dbt-centric analytics workflow. For the reviewer, it lets them know what it is they are reviewing before laying eyes on any code. Let's explore how to use each section and its benefits.

Database-centric

Database-centric BI SQL Coding

Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data

Data Engineering Weekly #196

Trending Sources

Unlocking Operational Efficiency: A Major Home Improvement Retailer’s Path to Data Modernization with Striim

Data Engineering Weekly #182

10 Lessons from 10 Years of Innovation and Engineering at Picnic

The Race For Data Quality in a Medallion Architecture

Every Company is Becoming a Software Company

The Rise of the Data Engineer

The Future of Business Intelligence is Open Source

CircleCI’s unnoticed holiday security breach

Building a maintainable and modular LLM application stack with Hamilton

Toward a Data Mesh (part 2) : Architecture & Technologies

How to manage and schedule dbt

Data News — Week 23.14

Data News — Week 13.14

The Rise of Unstructured Data

How to Become a Data Engineer in 2024?

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Top 10 Automation Testing Tools used in Software Industry

A Guide to the Confluent Verified Integrations Program

Hadoop vs Spark: Main Big Data Tools Explained

3 Use Cases for Generative AI Agents

How RPR Provides Top-Notch Geocoding Data with Precisely

Data Engineer Roles And Responsibilities 2022

Hexagonal Architecture: A Practical Guide

Rebuilding Netflix Video Processing Pipeline with Microservices

A Comprehensive Overview of Microsoft Fabric & Its Use Cases

Best Career Objective for Resume for Freshers with Sample

Revolutionizing Build Analytics: How to enhance build processes with ThoughtSpot

Node js vs JavaScript: Node Js Pros and Cons

Kickstart Your 2023 with these 6 Articles – The Meltano Teams Favorite Data Articles of 2022

Top-Paying Data Engineer Jobs in Singapore [2023 Updated]

Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences

Unlocking the Power of Geospatial Data for Insights

Industry Interview Series-Big Data in Healthcare

Python for Data Engineering

Why are database columns 191 characters?

20 Best Backend Development Tools In 2023

RDBMS vs NoSQL: Key Differences and Similarities

What is Azure Data Factory – Here’s Everything You Need to Know

Azure Synapse vs Databricks: 2023 Comparison Guide

Data Quality: The Missing Link in Your Cloud Data Migration

Data Pipelines in the Healthcare Industry

The Exact GitHub Pull Request Template We Use at dbt Labs

Stay Connected