Learning System Design: Top 5 Essential Reads
KDnuggets
MAY 23, 2024
Explore system design with these expert-recommended books.
This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
KDnuggets
MAY 23, 2024
Explore system design with these expert-recommended books.
Start Data Engineering
JANUARY 20, 2025
Pipeline design] Design data pipelines to populate your data models 2.5. [Requirements gathering] Make sure you clearly understand the requirements & business use case 2.2. Understand source data] Know what you have to work with 2.3. Model your data] Define data models for historical analytics 2.4.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Agent Tooling: Connecting AI to Your Tools, Systems & Data
How to Modernize Manufacturing Without Losing Control
Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration
Data Engineering Podcast
DECEMBER 3, 2023
Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. With Datafold, you can seamlessly plan, translate, and validate data across systems, massively accelerating your migration project. When is DoubleCloud Data Transfer the wrong choice?
Pinterest Engineering
JANUARY 31, 2025
Modern large-scale recommendation systems usually include multiple stages where retrieval aims at retrieving candidates from billions of candidate pools, and ranking predicts which item a user tends to engage from the trimmed candidate set retrieved from early stages [2]. General multi-stage recommendation system design in Pinterest.
Speaker: Dr. Greg Loughnane and Chris Alexiuk
However, during development – and even more so once deployed to production – best practices for operating and improving generative AI applications are less understood.
Monte Carlo
NOVEMBER 21, 2024
That’s where data pipeline design patterns come in. So, why does choosing the right data pipeline design matter? In this guide, we’ll explore the patterns that can help you design data pipelines that actually work. Table of Contents Common Data Pipeline Design Patterns Explained 1. Batch Processing Pattern 2.
Data Engineering Podcast
APRIL 14, 2024
In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database. When designing and building a database, what are the initial set of questions that need to be answered? Can you describe what constitutes a NoSQL database?
Data Engineering Podcast
MAY 26, 2024
Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. As you have gone through successive migration projects, how has that influenced the ways that you think about architecting data systems?
Engineering at Meta
APRIL 6, 2023
Buck2, our new open source, large-scale build system , is now available on GitHub. Buck2 is an extensible and performant build system written in Rust and designed to make your build experience faster and more efficient. In particular, we support Sapling-based file systems. Why rebuild Buck?
Speaker: Jason Tanner
A sustainable business model contains a system of interrelated choices made not once but over time. Discover how to design and evolve profit streams over time, focusing on solution sustainability, economic sustainability, and relationship sustainability.
phData: Data Engineering
NOVEMBER 8, 2024
The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. These systems are built on open standards and offer immense analytical and transactional processing flexibility. These formats are transforming how organizations manage large datasets.
Tweag
JULY 5, 2023
Buck2 is a from-scratch rewrite of Buck , a polyglot, monorepo build system that was developed and used at Meta (Facebook), and shares a few similarities with Bazel. As you may know, the Scalable Builds Group at Tweag has a strong interest in such scalable build systems. Meta recently announced they have made Buck2 open-source.
ArcGIS
MAY 13, 2024
Review best practices for designing and testing for accessibility maps and apps throughout the ArcGIS system during the development process.
Netflix Tech
NOVEMBER 12, 2024
By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.
Simon Späti
DECEMBER 20, 2024
While attempting to build a system that could define an entire data stack through a single YAML file, I encountered architectural questions that challenged my initial assumptions: Should we generate production-ready code from templates or create a boilerplate repository with best-in-class tools?
The Pragmatic Engineer
NOVEMBER 21, 2023
If you had a continuous deployment system up and running around 2010, you were ahead of the pack: but today it’s considered strange if your team would not have this for things like web applications. We dabbled in network engineering, database management, and system administration. Subscribe here. and hand-rolled C -code.
Seattle Data Guy
JANUARY 18, 2025
Because they can preserve the visual layout of documents and are compatible with a wide range of devices and operating systems, PDFs are used for everything from business forms and educational material to creative designs. PDF files are one of the most popular file formats today.
The Pragmatic Engineer
SEPTEMBER 19, 2024
In the early 90’s, DOS programs like the ones my company made had its own Text UI screen rendering system. This rendering system was easy for me to understand, even on day one. Our rendering system was very memory inefficient, but that could be fixed. By doing so, I got to see every screen of the system.
Jesse Anderson
FEBRUARY 11, 2025
Semih is a researcher and entrepreneur with a background in distributed systems and databases. He then pursued his doctoral studies at Stanford University, delving into the complexities of database systems.
The Pragmatic Engineer
JUNE 1, 2023
Juraj included system monitoring parts which monitor the server’s capacity he runs the app on: The monitoring page on the Rides app And it doesn’t end here. Juraj created a systems design explainer on how he built this project, and the technologies used: The systems design diagram for the Rides application The app uses: Node.js
Edureka
APRIL 10, 2025
When you hear the term System Hacking, it might bring to mind shadowy figures behind computer screens and high-stakes cyber heists. In this blog, we’ll explore the definition, purpose, process, and methods of prevention related to system hacking, offering a detailed overview to help demystify the concept.
Striim
SEPTEMBER 11, 2024
A data pipeline is a systematic sequence of components designed to automate the extraction, organization, transfer, transformation, and processing of data from one or more sources to a designated destination. Understanding the essential components of data pipelines is crucial for designing efficient and effective data architectures.
Analytics Vidhya
FEBRUARY 7, 2023
Introduction Data engineering is the field of study that deals with the design, construction, deployment, and maintenance of data processing systems. This includes designing and implementing […] The post Most Essential 2023 Interview Questions on Data Engineering appeared first on Analytics Vidhya.
Start Data Engineering
MAY 9, 2020
Change data capture is a software design pattern used to capture changes to data and take corresponding action based on that change. The corresponding action usually is supposed to occur in another system in response to the change that was made in the source system. The change to data is usually one of read, update or delete.
Analytics Vidhya
JANUARY 31, 2023
It is a powerful resource management system for a horizontal server environment. It is designed to be more flexible and generic than the original Hadoop MapReduce system, making it an attractive choice for companies looking to implement Hadoop. Introduction YARN stands for Yet Another Resource Negotiator.
Christophe Blefari
JANUARY 11, 2025
AI companies are aiming for the moon—AGI—promising it will arrive once OpenAI develops a system capable of generating at least $100 billion in profits. Meaning: a YAML configuration system for ingestion and transformations, and now, visualisation with BI-as-code. Meanwhile, the AI landscape remains unpredictable.
Start Data Engineering
MAY 28, 2024
Distributed systems are scalable, resilient to failures, & designed for high availability 4.5. Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB? Use DuckDB 4.4.
The Pragmatic Engineer
AUGUST 1, 2023
We put a lot of emphasis on communication and prioritization and the ability to unblock yourself or your team – this comes on top of the programming and design skills. It’s down to them to create well-designed, extensible, performant and secure solutions. Lead a strategic team effort, starting at the design stage.
The Pragmatic Engineer
MARCH 12, 2024
Tools and approaches at our disposal, which didn’t exist in 1975, or were not widespread in 1995, include: Git – the now-dominant version control system used by much of the industry, with exceptions for projects with very large assets, like video games Code reviews : these became common in parallel with version control.
Engineering at Meta
FEBRUARY 4, 2025
Here we explore initial system designs we considered, an overview of the current architecture, and some important principles Meta takes into account in making data accessible and easy to understand. We also considered caching data logs in an online system capable of supporting a range of indexed per-user queries.
Netflix Tech
JANUARY 6, 2025
In this case, the main stakeholders are: - Title Launch Operators Role: Responsible for setting up the title and its metadata into our systems. In this context, were focused on developing systems that ensure successful title launches, build trust between content creators and our brand, and reduce engineering operational overhead.
Seattle Data Guy
MAY 8, 2023
If you’re relying on your OLTP system to provide analytics, you might be in for a surprise. While it can work initially, these systems aren’t designed to handle complex queries. … Read more The post OLTP Vs OLAP – What Is The Difference appeared first on Seattle Data Guy.
Netflix Tech
MARCH 28, 2025
By Ko-Jen Hsiao , Yesu Feng and Sudarshan Lamkhede Motivation Netflixs personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including Continue Watching and Todays Top Picks for You. Refer to our recent overview for more details).
Snowflake
APRIL 9, 2025
KAWA Analytics Digital transformation is an admirable goal, but legacy systems and inefficient processes hold back many companies efforts. AI agents can assist with research, analytics, reconciliation and more just one part of KAWAs AI-native platform designed to enable automation with transparency and enterprise-grade security.
The Pragmatic Engineer
OCTOBER 17, 2024
This grant is designed to “support entrepreneurs, tech-geeks, developers, and socially engaged people, who are capable of challenging the way we search and discover information and resources on the internet” The team is tiny; only three people.
Analytics Vidhya
FEBRUARY 6, 2023
Introduction HDFS (Hadoop Distributed File System) is not a traditional database but a distributed file system designed to store and process big data. It is a core component of the Apache Hadoop ecosystem and allows for storing and processing large datasets across multiple commodity servers.
Data Engineering Weekly
JANUARY 15, 2025
The answer lies in unstructured data processing—a field that powers modern artificial intelligence (AI) systems. To address these challenges, AI Data Engineers have emerged as key players, designing scalable data workflows that fuel the next generation of AI systems. Their role is not just important; it is essential.
dbt Developer Hub
APRIL 20, 2025
Both AI agents and business stakeholders will then operate on top of LLM-driven systems hydrated by the dbt MCP context. Todays system is not a full realization of the vision in the posts shared above, but it is a meaningful step towards safely integrating your structured enterprise data into AI workflows. Why does this matter?
The Pragmatic Engineer
AUGUST 8, 2024
It can also venture in other areas – for example, it gets chatty when asking about the history of the company: Overall, it feels like the chatbot is carefully designed to not allow it to go into details that are not on a whitelist. With clever-enough probing, this system prompt can be revealed. Translate to English if needed.
The Pragmatic Engineer
AUGUST 22, 2023
Typical roles On a typical games project, programmers work alongside designers, artists, animators, writers, sound designers and other disciplines. I often explain this working relationship as that artists make it pretty , while designers and programmers make it work.
Data Engineering Weekly
MAY 4, 2025
The recommendation engine to find the data flow violation is an interesting design to monitor the data assets at scale. link] Whatnot: Evolving Feed Ranking at Whatnot Whatnot describes their transition from a batch prediction system to an online inference framework for ranking, which is shown in their "For You Feed."
Snowflake
DECEMBER 4, 2024
Beyond working with well-structured data in a data warehouse, modern AI systems can use deep learning and natural language processing to work effectively with unstructured and semi-structured data in data lakes and lakehouses. Rather than answering a specific question, independent agents will act on broad instructions from a human user.
Engineering at Meta
APRIL 28, 2025
Meta’s vast and diverse systems make it particularly challenging to comprehend its structure, meaning, and context at scale. We discovered that a flexible and incremental approach was necessary to onboard the wide variety of systems and languages used in building Metas products. We believe that privacy drives product innovation.
Snowflake
APRIL 2, 2025
The data warehouse solved for performance and scale but, much like the databases that preceded it, relied on proprietary formats to build vertically integrated systems. Its vendor-neutral by design, and the Polaris governance structure and community-driven development ensures it remains so.
Expert insights. Personalized for you.
We have resent the email to
Are you sure you want to cancel your subscriptions?
Let's personalize your content