Trade-offs in Data Systems Architecture
Understanding that every choice comes with consequences
Table of Contents
- Introduction
- Analytical versus Operational Systems
- 2.1. OLTP vs OLAP
- 2.2. Data Warehousing
- 2.3. Data Lakes
- Systems of Record and Derived Data
- Cloud versus Self-Hosting
- Distributed versus Single-Node Systems
- Data Systems, Law, and Society
- Summary
1. Introduction
In plain English: There are no perfect solutions in data systems—only trade-offs. Choosing one approach means giving up the benefits of another. This chapter helps you understand these trade-offs so you can make informed decisions.
In technical terms: Data-intensive applications face challenges in storing large volumes, managing data changes, ensuring consistency during failures, and maintaining high availability. The right architecture depends on understanding the trade-offs between different approaches.
Why it matters: Modern applications are data-intensive, not compute-intensive. Understanding trade-offs helps you choose the right tools and combine them effectively for your specific use case.
💡 Insight
The quote that opens this chapter—"There are no solutions, there are only trade-offs"—is the fundamental principle of system design. Every architectural decision has pros and cons. Your job is to find the best trade-off for your situation.
🎯 The Core Challenge
How do we build applications that need to:
2. Analytical versus Operational Systems
In plain English: Think of operational systems as the cash register at a store—handling individual transactions as they happen. Analytical systems are like the accounting department—looking at all transactions together to find patterns.
In technical terms: Operational systems (OLTP) handle real-time transactions for external users. Analytical systems (OLAP) process large datasets to generate insights for internal decision-making.
Why it matters: Using the same system for both purposes often leads to poor performance. Understanding the distinction helps you design appropriate architectures.
2.1. OLTP vs OLAP
| Property | Operational (OLTP) | Analytical (OLAP) |
|---|---|---|
| Read Pattern | Point queries (fetch by key) | Aggregate over many records |
| Write Pattern | Create/update/delete individual records | Bulk import (ETL) or event stream |
| Users | End users via app | Internal analysts |
| Query Type | Fixed, predefined by app | Ad-hoc, exploratory |
| Data View | Current state | Historical events over time |
| Dataset Size | Gigabytes to terabytes | Terabytes to petabytes |
💡 Insight
The separation exists for good reasons: OLTP systems prioritize low latency for individual operations, while OLAP systems prioritize throughput for scanning large datasets. Optimizing for one often hurts the other.
2.2. Data Warehousing
In plain English: A data warehouse is like a library that collects copies of all the books from every department in a company, organized in a way that makes it easy to find answers to business questions.
In technical terms: A data warehouse is a separate database that aggregates data from multiple operational systems via ETL (Extract-Transform-Load), optimized for analytical queries.
Why separate from operational systems?
| Problem | Solution via Warehouse |
|---|---|
| Data silos across systems | Centralized access |
| OLTP schemas not suited for analytics | Analysis-friendly schemas |
| Expensive queries impact users | No impact on production |
| Security/compliance restrictions | Controlled analyst access |
2.3. Data Lakes
In plain English: If a data warehouse is a library with organized books, a data lake is a massive storage facility where you can dump anything—books, videos, sensor data—in their original form.
In technical terms: A data lake stores raw data in any format (Avro, Parquet, JSON, images, etc.) without imposing a schema. It's cheaper than relational storage and more flexible for data science workloads.
💡 Insight
The "sushi principle" in data engineering: raw data is better. By storing data in its original form, each consumer can transform it to suit their specific needs, rather than being limited to a single transformed view.
3. Systems of Record and Derived Data
In plain English: A system of record is like the official birth certificate—it's the authoritative source. Derived data is like copies or summaries made from that original, which can be recreated if lost.
In technical terms: Systems of record hold canonical data; derived systems (caches, indexes, materialized views) are transformations that can be regenerated from the source.
Key Principle: If you lose derived data, you can recreate it. If you lose the system of record, the data is gone.
4. Cloud versus Self-Hosting
In plain English: Should you rent infrastructure from a cloud provider, or buy and manage your own servers? It's like choosing between renting an apartment (cloud) and buying a house (self-hosting).
In technical terms: Cloud services outsource infrastructure operations to vendors, while self-hosting gives you full control but requires operational expertise.
4.1. Pros and Cons of Cloud Services
💡 Insight
The biggest downside of cloud services is loss of control. If a feature is missing, you can only ask politely. If it goes down, you wait. If pricing changes, you pay or migrate.
4.2. Cloud-Native Architecture
In plain English: Cloud-native systems are designed from scratch to take advantage of cloud services, not just self-hosted software running on cloud VMs.
In technical terms: Cloud-native architectures separate storage and compute, use object stores for durability, and treat local disks as ephemeral caches.
| Category | Self-Hosted | Cloud-Native |
|---|---|---|
| OLTP | MySQL, PostgreSQL, MongoDB | Aurora, Cloud Spanner |
| OLAP | Teradata, ClickHouse, Spark | Snowflake, BigQuery |
5. Distributed versus Single-Node Systems
In plain English: Should your system run on one powerful computer, or spread across many computers connected by a network? More computers isn't always better.
In technical terms: A distributed system involves multiple processes (nodes) communicating over a network. While necessary for some requirements, it introduces significant complexity.
5.1. Reasons to Distribute
5.2. Problems with Distributed Systems
💡 Insight
"If you can do something on a single machine, this is often much simpler and cheaper compared to setting up a distributed system." Modern CPUs and disks are incredibly powerful—many workloads can run on a single node with tools like DuckDB or SQLite.
The challenges:
| Problem | Description |
|---|---|
| Network Failures | Requests can timeout without knowing if they succeeded |
| Latency | Network calls are vastly slower than local function calls |
| Debugging | Where is the problem when the system is slow? |
| Consistency | Keeping data synchronized across services is hard |
5.3. Microservices and Serverless
In plain English: Microservices split a big application into many small services that talk to each other. Serverless goes further—you just write functions, and the cloud handles everything else.
In technical terms: Microservices decompose applications into independent services with their own databases. Serverless (FaaS) abstracts away server management entirely, billing by execution time.
💡 Insight
Microservices are a technical solution to a people problem: allowing teams to work independently. In small companies with few teams, microservices add unnecessary complexity—keep it simple.
6. Data Systems, Law, and Society
In plain English: Data systems don't exist in a vacuum. We have responsibilities to the people whose data we collect—legally (GDPR, CCPA) and ethically.
In technical terms: Privacy regulations (GDPR, CCPA, EU AI Act) mandate data minimization, purpose limitation, and the right to erasure. These requirements influence system architecture.
Key principles:
| Principle | Description |
|---|---|
| Data Minimization | Only collect what you need |
| Purpose Limitation | Use data only for stated purposes |
| Right to Erasure | Delete data on user request |
| Storage Limitation | Don't keep data longer than necessary |
💡 Insight
The cost of storing data isn't just the S3 bill—it includes liability risks if leaked, legal costs if non-compliant, and safety risks to users. Sometimes the best decision is to not store certain data at all.
7. Summary
🎯 Key Trade-offs
| Trade-off | When to Choose A | When to Choose B |
|---|---|---|
| OLTP vs OLAP | Serving users | Analyzing data |
| Cloud vs Self-Host | Variable load, fast start | Predictable load, full control |
| Distributed vs Single | Scale/availability needs | Simplicity matters |
| Microservices vs Monolith | Large teams | Small teams |
📋 Key Concepts
| Concept | Definition |
|---|---|
| OLTP | Online Transaction Processing—serving user requests |
| OLAP | Online Analytical Processing—business intelligence |
| ETL | Extract-Transform-Load pipeline to data warehouse |
| Data Lake | Raw data storage in any format |
| System of Record | Authoritative source of truth |
| Derived Data | Data that can be regenerated from source |
📝 Key Takeaways
- Every architectural decision is a trade-off—understand what you're giving up
- Operational and analytical systems have different requirements; keep them separate
- Cloud services trade control for convenience—evaluate based on your specific situation
- Distributed systems add complexity; prefer single-node when possible
- Consider legal and ethical implications of storing personal data
Next: Chapter 2: Nonfunctional Requirements — Understanding reliability, scalability, and maintainability