Accessible and Systems - Data Engineering Digest

Designing and testing for accessibility in GIS and mapping

ArcGIS

MAY 13, 2024

Review best practices for designing and testing for accessibility maps and apps throughout the ArcGIS system during the development process.

Accessible

Accessible Accessibility Designing Systems

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. As you have gone through successive migration projects, how has that influenced the ways that you think about architecting data systems?

Systems

Systems Data Lake High Quality Data Google Cloud

A Tour Around Buck2, Meta's New Build System

Tweag

JULY 5, 2023

Buck2 is a from-scratch rewrite of Buck , a polyglot, monorepo build system that was developed and used at Meta (Facebook), and shares a few similarities with Bazel. As you may know, the Scalable Builds Group at Tweag has a strong interest in such scalable build systems. Meta recently announced they have made Buck2 open-source.

Systems

Systems Building Java Programming Language

Webinars

Apache Airflow®: The Ultimate Guide to DAG Writing

MORE WEBINARS

Designing and testing for accessibility in GIS and mapping

ArcGIS

MAY 13, 2024

Review best practices for designing and testing for accessibility maps and apps throughout the ArcGIS system during the development process.

Accessible

Accessible Accessibility Designing Systems

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Think your customers will pay more for data visualizations in your application? Five years ago they may have. But today, dashboards and visualizations have become table stakes. Discover which features will differentiate your application and maximize the ROI of your embedded analytics. Brought to you by Logi Analytics.

Data

Build faster with Buck2: Our open source build system

Engineering at Meta

APRIL 6, 2023

Buck2, our new open source, large-scale build system , is now available on GitHub. Buck2 is an extensible and performant build system written in Rust and designed to make your build experience faster and more efficient. In particular, we support Sapling-based file systems. Why rebuild Buck?

Building

Building Systems Java Coding

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. With Datafold, you can seamlessly plan, translate, and validate data across systems, massively accelerating your migration project. When is DoubleCloud Data Transfer the wrong choice?

Systems

Systems Designing Data Lake SQL

Fail Safe vs Fail Secure: Top Differences in Locking Systems

Knowledge Hut

MARCH 22, 2024

I have comprehensively analyzed the area of physical security, particularly the ongoing discussion surrounding fail safe vs fail-safe secure electric strike locking systems. On the other hand, fail-secure systems focus on maintaining continuous security, keeping doors locked even in difficult conditions to protect assets.

Systems

Systems Electronics Hospitality Architecture

Making AI More Accessible: Up to 80% Cost Savings with Meta Llama 3.3 on Databricks

databricks

DECEMBER 12, 2024

As enterprises build agent systems to deliver high quality AI apps, we continue to deliver optimizations to deliver best overall cost-efficiency for our.

Accessible

Accessible Accessibility Systems Building

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

NOVEMBER 21, 2023

If you had a continuous deployment system up and running around 2010, you were ahead of the pack: but today it’s considered strange if your team would not have this for things like web applications. We dabbled in network engineering, database management, and system administration. and hand-rolled C -code.

Engineering

Engineering Bytes Cloud Computing AWS

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

MAY 28, 2024

Use DuckDB to process data, not for multiple users to access data 4.2. Distributed systems are scalable, resilient to failures, & designed for high availability 4.5. Building efficient data pipelines with DuckDB 4.1. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB?

Data Pipeline

Data Pipeline Python Building Data

Zenlytic Is Building You A Better Coworker With AI Agents

Data Engineering Podcast

MAY 18, 2024

Summary The purpose of business intelligence systems is to allow anyone in the business to access and decode data to help them make informed decisions. Support Data Engineering Podcast Summary The purpose of business intelligence systems is to allow anyone in the business to access and decode data to help them make informed decisions.

Building

Building Data Lake High Quality Data Business Intelligence

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

What are the pain points that are still prevalent in lakehouse architectures as compared to warehouse or vertically integrated systems? What are the differences in terms of pipeline design/access and usage patterns when using a Trino/Iceberg lakehouse as compared to other popular warehouse/lakehouse structures?

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Redefining AIOps IT Workflows with Legacy System Visibility

Precisely

DECEMBER 16, 2024

Modern IT environments require comprehensive data for successful AIOps, that includes incorporating data from legacy systems like IBM i and IBM Z into ITOps platforms. AIOps presents enormous promise, but many organizations face hurdles in its implementation: Complex ecosystems made of multiple, fragmented systems that lack interoperability.

Systems

Systems IT Machine Learning Insurance

The “10x engineer:" 50 years ago and now

The Pragmatic Engineer

MARCH 12, 2024

Tools and approaches at our disposal, which didn’t exist in 1975, or were not widespread in 1995, include: Git – the now-dominant version control system used by much of the industry, with exceptions for projects with very large assets, like video games Code reviews : these became common in parallel with version control.

Engineering

Engineering Programming Language Hospitality Programming

Weekend maintenance kicks an Italian bank offline for days

The Pragmatic Engineer

APRIL 11, 2024

From Sella’s status page : “Following the installation of an update to the operating system and related firmware which led to an unstable situation. Still, I’m puzzled by how long the system has been down. If it was an update to Oracle, or to the operating system, then why not roll back the update?

Banking

Banking Utilities Database Engineering

Did Automattic commit open source theft?

The Pragmatic Engineer

OCTOBER 18, 2024

Corporate conflict recap Automattic is the creator of open source WordPress content management system (CMS), and WordPress powers an incredible 43% of webpages and 65% of CMSes. This event is shameful and unprecedented in the history of open source on the web.

Government

Government Engineering Project AWS

A Dive into the Basics of Big Data Storage with HDFS

Analytics Vidhya

FEBRUARY 6, 2023

Introduction HDFS (Hadoop Distributed File System) is not a traditional database but a distributed file system designed to store and process big data. It provides high-throughput access to data and is optimized for […] The post A Dive into the Basics of Big Data Storage with HDFS appeared first on Analytics Vidhya.

Data Storage

Data Storage Big Data Hadoop Datasets

The Collapse of Silicon Valley Bank

The Pragmatic Engineer

MARCH 13, 2023

The bank’s systems start to be overloaded to the point of customers not being able to log on and transfer. The FDIC is a government agency whose goal is to maintain stability and public confidence in the US financial system. For some startups, losing access to their bank account prompted drastic action.

Banking

Banking Insurance Portfolio Media

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Support Data Engineering Podcast Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems.

Kafka

Kafka Data Lake High Quality Data SQL

Data Warehousing Essentials: A Guide To Data Warehousing

Seattle Data Guy

FEBRUARY 10, 2024

It gives businesses access to the data from all of their various systems. Photo by Tiger Lily Data warehouses and data lakes play a crucial role for many businesses. As well as often integrating data so that end-users can answer business critical questions.

Data Lake

Data Lake Data Warehouse Data Accessible

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

The startup was able to start operations thanks to getting access to an EU grant called NGI Search grant. OpenSSL : the cryptography and SSL/TLS toolkit comes with a built-in performance benchmarking capability Lmbench : tools for performance analysis for UNIX/POSIX system.

Cloud

Cloud AWS Metadata Cloud Computing

Open source business model struggles at WordPress

The Pragmatic Engineer

OCTOBER 10, 2024

Wordpress is the most popular content management system (CMS), estimated to power around 43% of all websites; a staggering number! We are talking about a competitor to WP Engine (Automattic) with no concrete knowledge of WP Engine’s true revenue, and demanding full access to detailed revenue reports. 25 Sep: Block.

Consulting

Consulting AWS Engineering Software Engineer

Is the “AI developer”a threat to jobs – or a marketing stunt?

The Pragmatic Engineer

MARCH 19, 2024

Today, full subscribers got access to a comprehensive Senior-and-above tech compensation research. Source: Cognition So far, all we have is video demos, and accounts of those with access to this tool. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers.

Software Engineer

Software Engineer Software Engineering Programming Language Media

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. dbt, BI, warehouse marts, etc.) dbt, BI, warehouse marts, etc.)

Data Lake

Data Lake High Quality Data BI Data Workflow

20 Basic Linux Commands for Data Science Beginners

KDnuggets

JUNE 23, 2022

It will give you the power to automate tasks, build pipelines, access file systems, and enhance development operations. Essential Linux commands to improve the data science workflow.

Data Science

Data Science Data Accessible Accessibility

Why are Cloud Development Environments Spiking in Popularity, Now?

The Pragmatic Engineer

SEPTEMBER 26, 2023

With full-remote work, the risk is higher that someone other than the employee accesses the codebase. At the very least, far more logging is in place, and it can be easier to detect when larger parts of the codebase are accessed and copied across the network. Full subscribers can access a list with links here.

Cloud

Cloud Software Engineer Software Engineering Cloud Computing

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. What are the requirements around governance and auditability of data access that need to be addressed when sharing data?

Data Lake

Data Lake High Quality Data Government Data Pipeline

Paying down tech debt: further learnings

The Pragmatic Engineer

SEPTEMBER 19, 2024

In the early 90’s, DOS programs like the ones my company made had its own Text UI screen rendering system. This rendering system was easy for me to understand, even on day one. Our rendering system was very memory inefficient, but that could be fixed. By doing so, I got to see every screen of the system.

Recruitment

Recruitment Java Coding Project

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. There are numerous stream processing engines, near-real-time database engines, streaming SQL systems, etc. Can you describe what RisingWave is and the story behind it?

SQL

SQL Data Lake High Quality Data Data Pipeline

Datadog’s $65M/year customer mystery solved

The Pragmatic Engineer

MAY 11, 2023

A very popular open-source solution for systems and services monitoring. A fast and open-source column-oriented database management system, which is a popular choice for log management. Ukraine is one of the few countries for which we have access to nationwide data, through job site Djinni.

AWS

AWS Software Engineer Software Engineering Google Cloud

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

How do you manage the personalization of the AI functionality in your system for each user/team? However data engineers are challenged by both technical complexity and organizational complexity, with heterogeneous technologies to adopt, multiple data disciplines converging, legacy systems to support, and costs to manage.

Data Lake

Data Lake High Quality Data Data Pipeline Machine Learning

Snowflake Horizon Advances Industry-Leading Governance with Simplified Internal Marketplaces and AI Innovations

Snowflake

JUNE 5, 2024

At the same time, organizations must ensure the right people have access to the right content, while also protecting sensitive and/or Personally Identifiable Information (PII) and fulfilling a growing list of regulatory requirements.

Government

Government Accessible Accessibility Cloud

Ensuring the Successful Launch of Ads on Netflix

Netflix Tech

JUNE 1, 2023

In this blog post, we’ll discuss the methods we used to ensure a successful launch, including: How we tested the system Netflix technologies involved Best practices we developed Realistic Test Traffic Netflix traffic ebbs and flows throughout the day in a sinusoidal pattern. Basic with ads was launched worldwide on November 3rd.

Algorithm

Algorithm Metadata Kafka Systems

Introducing the Robinhood Crypto Trading API

Robinhood

MAY 30, 2024

Robinhood Crypto has continued to see its market share increase as customers get access to a growing number of advanced trading tools to help them efficiently navigate the crypto market.” We also engage third-party security experts to test our systems, helping us build some of the most secure systems in the industry.

Insurance

Insurance Portfolio Algorithm Coding

Shutting Down My Job Board for Software Engineering Positions After 2.5 Years

The Pragmatic Engineer

APRIL 9, 2024

On top of the ability to post jobs, they had access to The Pragmatic Engineer Talent Collective. Companies stayed customers longer, as they were interested in having access to the "latest drops" in the talent collective. (A There was a similar trend in companies paying to access the talent collective and job board.

Software Engineer

Software Engineer Software Engineering Engineering Recruitment

Meta developer tools: Working at scale

Engineering at Meta

JUNE 27, 2023

Sapling: Scaling version control Sapling is a version control system that can scale to huge sizes, but also emphasizes usability. There are three main components to Sapling – a server, a client, and a virtual file system. The final component is the virtual file system.

Java

Java Programming Language Algorithm Coding

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Engineering at Meta

AUGUST 27, 2024

At Meta, we’ve been diligently working to incorporate privacy into different systems of our software stack over the past few years. A crucial aspect of purpose limitation is managing data as it flows across systems and services. These innovations mark a major milestone in our ongoing commitment to honoring user privacy.

Programming Language

Programming Language Coding Data Warehouse Systems

Snowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open

Snowflake

APRIL 24, 2024

Members of the Snowflake AI Research team pioneered systems such as ZeRO and DeepSpeed , PagedAttention / vLLM , and LLM360 which significantly reduced the cost of LLM training and inference, and open sourced them to make LLMs more accessible and cost-effective for the community. license provides ungated access to weights and code.

Amazon Web Services

Amazon Web Services SQL AWS Architecture

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. What are the organizational/business factors that contribute to the complexity of these systems? What are the organizational/business factors that contribute to the complexity of these systems?

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

Petr shares his journey from being an engineer to founding Synq, emphasizing the importance of treating data systems with the same rigor as engineering systems. He discusses the challenges and solutions in data reliability, including the need for transparency and ownership in data systems. Want to see Starburst in action?

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

What are the experimental methods that you are using to gain understanding in the opportunities and practical limits of those systems? As you strive to push the limits of technical capacity in data systems, how does that impact the usability of the resulting systems?

Data Process

Data Process Process Data Lake High Quality Data

Securely Connect to LLMs and Other External Services from Snowpark

Snowflake

SEPTEMBER 7, 2023

We are excited to announce the public preview of External Access, which enables customers to reach external endpoints from Snowpark seamlessly and securely. With this announcement, External Access is in public preview on Amazon Web Services (AWS) regions.

Amazon Web Services

Amazon Web Services AWS Government Python

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

As AI systems gain more sophistication there is a challenge with establishing and maintaining trust. What are the risks involved in deploying more human-level AI systems and monitoring their reliability? What are the practical/architectural methods necessary to build more cognitive AI systems?

Building

Building Data Lake High Quality Data Machine Learning

Designing and testing for accessibility in GIS and mapping

Data Migration Strategies For Large Scale Systems

A Tour Around Buck2, Meta's New Build System

Webinars

Designing and testing for accessibility in GIS and mapping

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Build faster with Buck2: Our open source build system

Designing Data Transfer Systems That Scale

Fail Safe vs Fail Secure: Top Differences in Locking Systems

Making AI More Accessible: Up to 80% Cost Savings with Meta Llama 3.3 on Databricks

The Roots of Today's Modern Backend Engineering Practices

Building cost effective data pipelines with Python & DuckDB

Zenlytic Is Building You A Better Coworker With AI Agents

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Redefining AIOps IT Workflows with Legacy System Visibility

The “10x engineer:" 50 years ago and now

Weekend maintenance kicks an Italian bank offline for days

Did Automattic commit open source theft?

A Dive into the Basics of Big Data Storage with HDFS

The Collapse of Silicon Valley Bank

Troubleshooting Kafka In Production

Data Warehousing Essentials: A Guide To Data Warehousing

Interesting startup idea: benchmarking cloud platform pricing

Open source business model struggles at WordPress

Is the “AI developer”a threat to jobs – or a marketing stunt?

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

20 Basic Linux Commands for Data Science Beginners

Why are Cloud Development Environments Spiking in Popularity, Now?

Data Sharing Across Business And Platform Boundaries

Paying down tech debt: further learnings

Tackling Real Time Streaming Data With SQL Using RisingWave

Datadog’s $65M/year customer mystery solved

Making Email Better With AI At Shortwave

Snowflake Horizon Advances Industry-Leading Governance with Simplified Internal Marketplaces and AI Innovations

Ensuring the Successful Launch of Ads on Netflix

Introducing the Robinhood Crypto Trading API

Shutting Down My Job Board for Software Engineering Positions After 2.5 Years

Meta developer tools: Working at scale

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Snowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open

Modern Customer Data Platform Principles

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Securely Connect to LLMs and Other External Services from Snowpark

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Stay Connected