Trade-offs in Data Systems Architecture

Understanding that every choice comes with consequences

Introduction
Analytical versus Operational Systems
- 2.1. OLTP vs OLAP
- 2.2. Data Warehousing
- 2.3. Data Lakes
Systems of Record and Derived Data
Cloud versus Self-Hosting
- 4.1. Pros and Cons of Cloud Services
- 4.2. Cloud-Native Architecture
Distributed versus Single-Node Systems
- 5.1. Reasons to Distribute
- 5.2. Problems with Distributed Systems
- 5.3. Microservices and Serverless
Data Systems, Law, and Society
Summary

1. Introduction

In plain English: There are no perfect solutions in data systems—only trade-offs. Choosing one approach means giving up the benefits of another. This chapter helps you understand these trade-offs so you can make informed decisions.

In technical terms: Data-intensive applications face challenges in storing large volumes, managing data changes, ensuring consistency during failures, and maintaining high availability. The right architecture depends on understanding the trade-offs between different approaches.

Why it matters: Modern applications are data-intensive, not compute-intensive. Understanding trade-offs helps you choose the right tools and combine them effectively for your specific use case.

💡 Insight

The quote that opens this chapter—"There are no solutions, there are only trade-offs"—is the fundamental principle of system design. Every architectural decision has pros and cons. Your job is to find the best trade-off for your situation.

🎯 The Core Challenge

How do we build applications that need to:

💾

Store Data

So applications can find it again later

⚡

Cache Results

Speed up reads for expensive operations

🔍

Search & Filter

Allow users to query by keyword

📊

Stream Events

Handle data changes as they occur

2. Analytical versus Operational Systems

In plain English: Think of operational systems as the cash register at a store—handling individual transactions as they happen. Analytical systems are like the accounting department—looking at all transactions together to find patterns.

In technical terms: Operational systems (OLTP) handle real-time transactions for external users. Analytical systems (OLAP) process large datasets to generate insights for internal decision-making.

Why it matters: Using the same system for both purposes often leads to poor performance. Understanding the distinction helps you design appropriate architectures.

Operational vs Analytical Systems

Operational (OLTP)

Web/Mobile Apps

Point-of-Sale

User Transactions

ETL→

Analytical (OLAP)

Data Warehouse

Business Intelligence

ML/AI Systems

2.1. OLTP vs OLAP

Property	Operational (OLTP)	Analytical (OLAP)
Read Pattern	Point queries (fetch by key)	Aggregate over many records
Write Pattern	Create/update/delete individual records	Bulk import (ETL) or event stream
Users	End users via app	Internal analysts
Query Type	Fixed, predefined by app	Ad-hoc, exploratory
Data View	Current state	Historical events over time
Dataset Size	Gigabytes to terabytes	Terabytes to petabytes

💡 Insight

The separation exists for good reasons: OLTP systems prioritize low latency for individual operations, while OLAP systems prioritize throughput for scanning large datasets. Optimizing for one often hurts the other.

2.2. Data Warehousing

In plain English: A data warehouse is like a library that collects copies of all the books from every department in a company, organized in a way that makes it easy to find answers to business questions.

In technical terms: A data warehouse is a separate database that aggregates data from multiple operational systems via ETL (Extract-Transform-Load), optimized for analytical queries.

ETL Pipeline to Data Warehouse

Extract

Pull from operational DBs

→

Transform

Clean and reshape data

→

Load

Write to warehouse

Why separate from operational systems?

Problem	Solution via Warehouse
Data silos across systems	Centralized access
OLTP schemas not suited for analytics	Analysis-friendly schemas
Expensive queries impact users	No impact on production
Security/compliance restrictions	Controlled analyst access

2.3. Data Lakes

In plain English: If a data warehouse is a library with organized books, a data lake is a massive storage facility where you can dump anything—books, videos, sensor data—in their original form.

In technical terms: A data lake stores raw data in any format (Avro, Parquet, JSON, images, etc.) without imposing a schema. It's cheaper than relational storage and more flexible for data science workloads.

Data Lake Architecture

CRM Data

App Logs

IoT Sensors

User Events

↓

Data Lake (Object Storage)

↓

Data Warehouse

ML Training

Analytics

💡 Insight

The "sushi principle" in data engineering: raw data is better. By storing data in its original form, each consumer can transform it to suit their specific needs, rather than being limited to a single transformed view.

3. Systems of Record and Derived Data

In plain English: A system of record is like the official birth certificate—it's the authoritative source. Derived data is like copies or summaries made from that original, which can be recreated if lost.

In technical terms: Systems of record hold canonical data; derived systems (caches, indexes, materialized views) are transformations that can be regenerated from the source.

System of Record vs Derived Data

📝System of Record

derives→

Cache

derives→

Search Index

derives→

Materialized View

derives→

ML Model

Key Principle: If you lose derived data, you can recreate it. If you lose the system of record, the data is gone.

4. Cloud versus Self-Hosting

In plain English: Should you rent infrastructure from a cloud provider, or buy and manage your own servers? It's like choosing between renting an apartment (cloud) and buying a house (self-hosting).

In technical terms: Cloud services outsource infrastructure operations to vendors, while self-hosting gives you full control but requires operational expertise.

Spectrum of Hosting Options

🏠

Bespoke

Write & run in-house

→

🖥️

Self-Host

Run open source/commercial

→

☁️

IaaS

VMs in the cloud

→

🌐

SaaS

Vendor-operated service

4.1. Pros and Cons of Cloud Services

Self-Hosted

Cloud Service

Operational Burden

You manage everything

Provider handles it

Scaling

Provision in advance

Auto-scale on demand

Cost (predictable load)

Often cheaper

Premium pricing

Cost (variable load)

Pay for peak capacity

Pay for actual usage

Customization

Full control

Limited options

Debugging

Full access to logs/metrics

Black box

💡 Insight

The biggest downside of cloud services is loss of control. If a feature is missing, you can only ask politely. If it goes down, you wait. If pricing changes, you pay or migrate.

4.2. Cloud-Native Architecture

In plain English: Cloud-native systems are designed from scratch to take advantage of cloud services, not just self-hosted software running on cloud VMs.

In technical terms: Cloud-native architectures separate storage and compute, use object stores for durability, and treat local disks as ephemeral caches.

Traditional vs Cloud-Native Architecture

Traditional

Application

Database

Local Disk

Cloud-Native

Compute (Ephemeral)

Database Service

Object Storage (S3)

Category	Self-Hosted	Cloud-Native
OLTP	MySQL, PostgreSQL, MongoDB	Aurora, Cloud Spanner
OLAP	Teradata, ClickHouse, Spark	Snowflake, BigQuery

5. Distributed versus Single-Node Systems

In plain English: Should your system run on one powerful computer, or spread across many computers connected by a network? More computers isn't always better.

In technical terms: A distributed system involves multiple processes (nodes) communicating over a network. While necessary for some requirements, it introduces significant complexity.

5.1. Reasons to Distribute

👥

Inherently Distributed

Multi-user apps across devices

🛡️

Fault Tolerance

Redundancy if machines fail

📈

Scalability

Data too big for one machine

🌍

Latency

Servers close to users globally

⚡

Elasticity

Scale up/down with demand

⚖️

Legal Compliance

Data residency requirements

5.2. Problems with Distributed Systems

💡 Insight

"If you can do something on a single machine, this is often much simpler and cheaper compared to setting up a distributed system." Modern CPUs and disks are incredibly powerful—many workloads can run on a single node with tools like DuckDB or SQLite.

The challenges:

Problem	Description
Network Failures	Requests can timeout without knowing if they succeeded
Latency	Network calls are vastly slower than local function calls
Debugging	Where is the problem when the system is slow?
Consistency	Keeping data synchronized across services is hard

5.3. Microservices and Serverless

In plain English: Microservices split a big application into many small services that talk to each other. Serverless goes further—you just write functions, and the cloud handles everything else.

In technical terms: Microservices decompose applications into independent services with their own databases. Serverless (FaaS) abstracts away server management entirely, billing by execution time.

Microservices Architecture

Service A

API

→

Service B

API

→

Service C

API

💡 Insight

Microservices are a technical solution to a people problem: allowing teams to work independently. In small companies with few teams, microservices add unnecessary complexity—keep it simple.

6. Data Systems, Law, and Society

In plain English: Data systems don't exist in a vacuum. We have responsibilities to the people whose data we collect—legally (GDPR, CCPA) and ethically.

In technical terms: Privacy regulations (GDPR, CCPA, EU AI Act) mandate data minimization, purpose limitation, and the right to erasure. These requirements influence system architecture.

Key principles:

Principle	Description
Data Minimization	Only collect what you need
Purpose Limitation	Use data only for stated purposes
Right to Erasure	Delete data on user request
Storage Limitation	Don't keep data longer than necessary

💡 Insight

The cost of storing data isn't just the S3 bill—it includes liability risks if leaked, legal costs if non-compliant, and safety risks to users. Sometimes the best decision is to not store certain data at all.

7. Summary

🎯 Key Trade-offs

Trade-off	When to Choose A	When to Choose B
OLTP vs OLAP	Serving users	Analyzing data
Cloud vs Self-Host	Variable load, fast start	Predictable load, full control
Distributed vs Single	Scale/availability needs	Simplicity matters
Microservices vs Monolith	Large teams	Small teams

📋 Key Concepts

Concept	Definition
OLTP	Online Transaction Processing—serving user requests
OLAP	Online Analytical Processing—business intelligence
ETL	Extract-Transform-Load pipeline to data warehouse
Data Lake	Raw data storage in any format
System of Record	Authoritative source of truth
Derived Data	Data that can be regenerated from source

📝 Key Takeaways

Every architectural decision is a trade-off—understand what you're giving up
Operational and analytical systems have different requirements; keep them separate
Cloud services trade control for convenience—evaluate based on your specific situation
Distributed systems add complexity; prefer single-node when possible
Consider legal and ethical implications of storing personal data

Next: Chapter 2: Nonfunctional Requirements — Understanding reliability, scalability, and maintainability

Table of Contents​

1. Introduction​

🎯 The Core Challenge​

2. Analytical versus Operational Systems​

2.1. OLTP vs OLAP​

2.2. Data Warehousing​

2.3. Data Lakes​

3. Systems of Record and Derived Data​

4. Cloud versus Self-Hosting​

4.1. Pros and Cons of Cloud Services​

4.2. Cloud-Native Architecture​

5. Distributed versus Single-Node Systems​

5.1. Reasons to Distribute​

5.2. Problems with Distributed Systems​

5.3. Microservices and Serverless​

6. Data Systems, Law, and Society​

7. Summary​

🎯 Key Trade-offs​

📋 Key Concepts​

📝 Key Takeaways​

Table of Contents

1. Introduction

🎯 The Core Challenge

2. Analytical versus Operational Systems

2.1. OLTP vs OLAP

2.2. Data Warehousing

2.3. Data Lakes

3. Systems of Record and Derived Data

4. Cloud versus Self-Hosting

4.1. Pros and Cons of Cloud Services

4.2. Cloud-Native Architecture

5. Distributed versus Single-Node Systems

5.1. Reasons to Distribute

5.2. Problems with Distributed Systems

5.3. Microservices and Serverless

6. Data Systems, Law, and Society

7. Summary

🎯 Key Trade-offs

📋 Key Concepts

📝 Key Takeaways