Skip to main content

Trade-offs in Data Systems Architecture

Understanding that every choice comes with consequences


Table of Contents

  1. Introduction
  2. Analytical versus Operational Systems
  3. Systems of Record and Derived Data
  4. Cloud versus Self-Hosting
  5. Distributed versus Single-Node Systems
  6. Data Systems, Law, and Society
  7. Summary

1. Introduction

In plain English: There are no perfect solutions in data systems—only trade-offs. Choosing one approach means giving up the benefits of another. This chapter helps you understand these trade-offs so you can make informed decisions.

In technical terms: Data-intensive applications face challenges in storing large volumes, managing data changes, ensuring consistency during failures, and maintaining high availability. The right architecture depends on understanding the trade-offs between different approaches.

Why it matters: Modern applications are data-intensive, not compute-intensive. Understanding trade-offs helps you choose the right tools and combine them effectively for your specific use case.

💡 Insight

The quote that opens this chapter—"There are no solutions, there are only trade-offs"—is the fundamental principle of system design. Every architectural decision has pros and cons. Your job is to find the best trade-off for your situation.

🎯 The Core Challenge

How do we build applications that need to:

💾
Store Data
So applications can find it again later
Cache Results
Speed up reads for expensive operations
🔍
Search & Filter
Allow users to query by keyword
📊
Stream Events
Handle data changes as they occur

2. Analytical versus Operational Systems

In plain English: Think of operational systems as the cash register at a store—handling individual transactions as they happen. Analytical systems are like the accounting department—looking at all transactions together to find patterns.

In technical terms: Operational systems (OLTP) handle real-time transactions for external users. Analytical systems (OLAP) process large datasets to generate insights for internal decision-making.

Why it matters: Using the same system for both purposes often leads to poor performance. Understanding the distinction helps you design appropriate architectures.

Operational vs Analytical Systems
Operational (OLTP)
Web/Mobile Apps
Point-of-Sale
User Transactions
ETL
Analytical (OLAP)
Data Warehouse
Business Intelligence
ML/AI Systems

2.1. OLTP vs OLAP

PropertyOperational (OLTP)Analytical (OLAP)
Read PatternPoint queries (fetch by key)Aggregate over many records
Write PatternCreate/update/delete individual recordsBulk import (ETL) or event stream
UsersEnd users via appInternal analysts
Query TypeFixed, predefined by appAd-hoc, exploratory
Data ViewCurrent stateHistorical events over time
Dataset SizeGigabytes to terabytesTerabytes to petabytes

💡 Insight

The separation exists for good reasons: OLTP systems prioritize low latency for individual operations, while OLAP systems prioritize throughput for scanning large datasets. Optimizing for one often hurts the other.

2.2. Data Warehousing

In plain English: A data warehouse is like a library that collects copies of all the books from every department in a company, organized in a way that makes it easy to find answers to business questions.

In technical terms: A data warehouse is a separate database that aggregates data from multiple operational systems via ETL (Extract-Transform-Load), optimized for analytical queries.

ETL Pipeline to Data Warehouse
1
Extract
Pull from operational DBs
2
Transform
Clean and reshape data
3
Load
Write to warehouse

Why separate from operational systems?

ProblemSolution via Warehouse
Data silos across systemsCentralized access
OLTP schemas not suited for analyticsAnalysis-friendly schemas
Expensive queries impact usersNo impact on production
Security/compliance restrictionsControlled analyst access

2.3. Data Lakes

In plain English: If a data warehouse is a library with organized books, a data lake is a massive storage facility where you can dump anything—books, videos, sensor data—in their original form.

In technical terms: A data lake stores raw data in any format (Avro, Parquet, JSON, images, etc.) without imposing a schema. It's cheaper than relational storage and more flexible for data science workloads.

Data Lake Architecture
CRM Data
App Logs
IoT Sensors
User Events
Data Lake (Object Storage)
Data Warehouse
ML Training
Analytics

💡 Insight

The "sushi principle" in data engineering: raw data is better. By storing data in its original form, each consumer can transform it to suit their specific needs, rather than being limited to a single transformed view.


3. Systems of Record and Derived Data

In plain English: A system of record is like the official birth certificate—it's the authoritative source. Derived data is like copies or summaries made from that original, which can be recreated if lost.

In technical terms: Systems of record hold canonical data; derived systems (caches, indexes, materialized views) are transformations that can be regenerated from the source.

System of Record vs Derived Data
📝System of Record
derives
Cache
derives
Search Index
derives
Materialized View
derives
ML Model

Key Principle: If you lose derived data, you can recreate it. If you lose the system of record, the data is gone.


4. Cloud versus Self-Hosting

In plain English: Should you rent infrastructure from a cloud provider, or buy and manage your own servers? It's like choosing between renting an apartment (cloud) and buying a house (self-hosting).

In technical terms: Cloud services outsource infrastructure operations to vendors, while self-hosting gives you full control but requires operational expertise.

Spectrum of Hosting Options
🏠
Bespoke
Write & run in-house
🖥️
Self-Host
Run open source/commercial
☁️
IaaS
VMs in the cloud
🌐
SaaS
Vendor-operated service

4.1. Pros and Cons of Cloud Services

Self-Hosted
Cloud Service
Operational Burden
You manage everything
Provider handles it
Scaling
Provision in advance
Auto-scale on demand
Cost (predictable load)
Often cheaper
Premium pricing
Cost (variable load)
Pay for peak capacity
Pay for actual usage
Customization
Full control
Limited options
Debugging
Full access to logs/metrics
Black box

💡 Insight

The biggest downside of cloud services is loss of control. If a feature is missing, you can only ask politely. If it goes down, you wait. If pricing changes, you pay or migrate.

4.2. Cloud-Native Architecture

In plain English: Cloud-native systems are designed from scratch to take advantage of cloud services, not just self-hosted software running on cloud VMs.

In technical terms: Cloud-native architectures separate storage and compute, use object stores for durability, and treat local disks as ephemeral caches.

Traditional vs Cloud-Native Architecture
Traditional
Application
Database
Local Disk
Cloud-Native
Compute (Ephemeral)
Database Service
Object Storage (S3)
CategorySelf-HostedCloud-Native
OLTPMySQL, PostgreSQL, MongoDBAurora, Cloud Spanner
OLAPTeradata, ClickHouse, SparkSnowflake, BigQuery

5. Distributed versus Single-Node Systems

In plain English: Should your system run on one powerful computer, or spread across many computers connected by a network? More computers isn't always better.

In technical terms: A distributed system involves multiple processes (nodes) communicating over a network. While necessary for some requirements, it introduces significant complexity.

5.1. Reasons to Distribute

👥
Inherently Distributed
Multi-user apps across devices
🛡️
Fault Tolerance
Redundancy if machines fail
📈
Scalability
Data too big for one machine
🌍
Latency
Servers close to users globally
Elasticity
Scale up/down with demand
⚖️
Legal Compliance
Data residency requirements

5.2. Problems with Distributed Systems

💡 Insight

"If you can do something on a single machine, this is often much simpler and cheaper compared to setting up a distributed system." Modern CPUs and disks are incredibly powerful—many workloads can run on a single node with tools like DuckDB or SQLite.

The challenges:

ProblemDescription
Network FailuresRequests can timeout without knowing if they succeeded
LatencyNetwork calls are vastly slower than local function calls
DebuggingWhere is the problem when the system is slow?
ConsistencyKeeping data synchronized across services is hard

5.3. Microservices and Serverless

In plain English: Microservices split a big application into many small services that talk to each other. Serverless goes further—you just write functions, and the cloud handles everything else.

In technical terms: Microservices decompose applications into independent services with their own databases. Serverless (FaaS) abstracts away server management entirely, billing by execution time.

Microservices Architecture
Service A
API
DB
Service B
API
DB
Service C
API
DB

💡 Insight

Microservices are a technical solution to a people problem: allowing teams to work independently. In small companies with few teams, microservices add unnecessary complexity—keep it simple.


6. Data Systems, Law, and Society

In plain English: Data systems don't exist in a vacuum. We have responsibilities to the people whose data we collect—legally (GDPR, CCPA) and ethically.

In technical terms: Privacy regulations (GDPR, CCPA, EU AI Act) mandate data minimization, purpose limitation, and the right to erasure. These requirements influence system architecture.

Key principles:

PrincipleDescription
Data MinimizationOnly collect what you need
Purpose LimitationUse data only for stated purposes
Right to ErasureDelete data on user request
Storage LimitationDon't keep data longer than necessary

💡 Insight

The cost of storing data isn't just the S3 bill—it includes liability risks if leaked, legal costs if non-compliant, and safety risks to users. Sometimes the best decision is to not store certain data at all.


7. Summary

🎯 Key Trade-offs

Trade-offWhen to Choose AWhen to Choose B
OLTP vs OLAPServing usersAnalyzing data
Cloud vs Self-HostVariable load, fast startPredictable load, full control
Distributed vs SingleScale/availability needsSimplicity matters
Microservices vs MonolithLarge teamsSmall teams

📋 Key Concepts

ConceptDefinition
OLTPOnline Transaction Processing—serving user requests
OLAPOnline Analytical Processing—business intelligence
ETLExtract-Transform-Load pipeline to data warehouse
Data LakeRaw data storage in any format
System of RecordAuthoritative source of truth
Derived DataData that can be regenerated from source

📝 Key Takeaways

  • Every architectural decision is a trade-off—understand what you're giving up
  • Operational and analytical systems have different requirements; keep them separate
  • Cloud services trade control for convenience—evaluate based on your specific situation
  • Distributed systems add complexity; prefer single-node when possible
  • Consider legal and ethical implications of storing personal data

Next: Chapter 2: Nonfunctional Requirements — Understanding reliability, scalability, and maintainability