Chapter 5: Workflow Patterns

Orchestrating complex machine learning systems with reusable architectural patterns

What is Workflow?
- 1.1. Sequential Workflows vs Directed Acyclic Graphs (DAGs)
- 1.2. Understanding Workflow Complexity
Fan-in and Fan-out Patterns: Composing Complex ML Workflows
- 2.1. The Problem: Training Multiple Models
- 2.2. The Solution: Systematic Pattern Application
- 2.3. Discussion: When to Use These Patterns
- 2.4. Exercises
Synchronous and Asynchronous Patterns: Accelerating Workflows
- 3.1. The Problem: Long-Running Step Bottlenecks
- 3.2. The Solution: Concurrent Execution Strategies
- 3.3. Discussion: Speed vs Quality Trade-offs
- 3.4. Exercises
Step Memoization Pattern: Skipping Redundant Workloads
- 4.1. The Problem: Unnecessary Re-execution
- 4.2. The Solution: Intelligent Caching Strategies
- 4.3. Discussion: Cache Management Considerations
- 4.4. Exercises
Summary and Exercises

Think of building ML workflows like conducting a symphony orchestra - you need to coordinate multiple musicians (components), handle different instrument sections (patterns), and ensure everyone plays in harmony while some may have solo performances at different times.

1. What is Workflow?

In plain English: A workflow is like a recipe with multiple steps that must happen in a specific order, where each step takes input from previous steps and produces output for the next ones.

In technical terms: A workflow is a directed graph of computational steps with explicit dependencies, where each node represents a discrete operation (data ingestion, training, serving) and edges represent data flow between operations.

Why it matters: Proper workflow design ensures efficient resource utilization, enables parallel execution where possible, and provides clear visibility into system behavior and debugging.

A workflow consists of arbitrary combinations of components commonly seen in real-world ML applications:

Data ingestion - Collecting and preprocessing raw data
Distributed model training - Building ML models at scale
Model serving - Deploying models for inference

1.1. Sequential Workflows vs Directed Acyclic Graphs (DAGs)

Simple Sequential Workflow:

Simple ML Workflow with Sequential Execution

Data
Ingestion

→

Model
Training

→

Model
Serving

More Complex Workflow with Parallel Paths:

Complex Workflow with Multiple Training and Serving Paths

Data Ingestion

↓

Model
Training A

↓

Model
Serving A

↓

Model
Training B

↓

Model
Serving B

Insight

Think of workflows like restaurant operations: simple workflows are like a single chef making one dish at a time, while complex workflows are like multiple chefs working different stations simultaneously to serve multiple dishes.

1.2. Understanding Workflow Complexity

Sequential Workflow: Steps execute one after another in strict order.

Sequential Execution Pattern

Step A

Done

→

Step B

Starts after A completes

→

Step C

Starts after B completes

Directed Acyclic Graph (DAG): Steps can have dependencies but never form closed loops.

Valid DAG (No Cycles):

Valid DAG - No Closed Loops

Step A

↓

Step B

↓

Step C

↓

Step D

Invalid DAG (Has Cycles):

Invalid DAG - Creates a Closed Loop

Step A

↓ Forward

Step B

↓ Forward

Step C

↑ Back to A - Creates cycle!

2. Fan-in and Fan-out Patterns: Composing Complex ML Workflows

2.1. The Problem: Training Multiple Models

Imagine you want to build a video tagging system that uses multiple models to capture different aspects of videos. You need to:

Train different model architectures
Select the top-performing models
Use multiple models for better coverage
Aggregate results for comprehensive tagging

Example Use Case: YouTube-8M video entity tagging with ensemble approach.

2.2. The Solution: Systematic Pattern Application

Baseline Workflow:

Simple Baseline Workflow

Data
Ingestion

→

Model
Training

→

Model
Serving

Enhanced Multi-Model Workflow (Fan-out Pattern):

Fan-out Pattern - Multiple Training Paths

Data Ingestion

↓

Model
Training 1

↓

Model
Training 2

↓

Model
Training 3

Complete Workflow with Model Selection:

Complete Workflow with Model Selection and Fan-in

Data Ingestion

Fan-out: Training Phase

Training 1
(90% acc)

Training 2
(92% acc)

Training 3
(75% acc)

↓

Model Selection (Top 2)

Fan-in: Serving Phase

Model
Serving A

Model
Serving B

↓

Result Aggregation

Why Multiple Models Work Better:

Model A Knowledge

Combined Knowledge (A + B)

Entities

Food, Car, Animals, Nature (4 entities)

Food, Car, Animals, Nature, Music, Sports, Technology (7 entities)

Coverage

Limited domain coverage

Broader domain coverage

Accuracy

Single model perspective

Ensemble benefits from diversity

Fan-out Pattern Structure:

Fan-out Pattern: One Input Feeds Multiple Outputs

Data Ingestion

↓ Fan-out: One input feeds multiple outputs

Training 1

Training 2

Training 3

Fan-in Pattern Structure:

Fan-in Pattern: Multiple Inputs Combine into One Output

Serving A

Serving B

↓ Fan-in: Multiple inputs combine into one output

Result Aggregation

Insight

Fan-out and fan-in patterns are like a river delta: water flows from one source (fan-out) into multiple channels, then these channels may converge again downstream (fan-in). This natural flow pattern applies beautifully to ML workflows.

2.3. Discussion: When to Use These Patterns

Use fan-in/fan-out patterns when:

Multiple steps are independent - Steps can run without waiting for each other
Sequential execution is too slow - Parallel execution provides significant speedup

Avoid these patterns when:

Steps have strict dependencies (e.g., ensemble models that need all sub-models first)
Steps need specific execution order
Resource constraints limit parallel execution

Ensemble Model Challenge:

Dependency Constraint with Ensemble Models

Training A

Training B

Training C

Must wait for ALL to complete

↓

Ensemble Training

Insight

Think of dependencies like a cooking recipe: you can chop vegetables in parallel (fan-out), but you can't make the sauce until all ingredients are ready (dependency constraint).

2.4. Exercises

Q: If steps are not independent of each other, can we use fan-in or fan-out patterns?

A: No, because we have no guarantee in what order concurrent copies of those steps will run.
Q: What's the main problem when trying to build ensemble models with the fan-in pattern?

A: Training an ensemble model depends on completing other model training steps for the sub-models. We cannot use the fan-in pattern because the ensemble model training step will need to wait for other model training to complete before it can start running, which would require extra waiting and delay the entire workflow.

3. Synchronous and Asynchronous Patterns: Accelerating Workflows

3.1. The Problem: Long-Running Step Bottlenecks

Imagine three model training steps with vastly different completion times:

Duration Differences Causing Workflow Delays

Training 1

1 week

Training 2

1 week

Training 3

2 weeks

Waiting Time

1 extra week!

The bottleneck: All subsequent steps (model selection, serving) must wait for the slowest step to complete.

3.2. The Solution: Concurrent Execution Strategies

Naive Approach - Remove Slow Step:

Simple but Limiting Solution

Data
Ingestion

→

Fast training only

Training 1

Training 2

Skip Training 3

→

Model
Selection

Problem: We lose the potentially best model from the complex training step.

Better Approach - Asynchronous Execution:

Week 1: Deploy First Model

Training 1 completes → Deploy immediately

↓

Week 2: Ensemble with Two Models

Training 2 completes → Update with ensemble

↓

Week 3: Full Ensemble

Training 3 completes → Best quality results

Synchronous vs Asynchronous Execution:

Synchronous (Traditional)

Asynchronous (Optimized)

Execution

Step A → Wait → Step B → Wait → Step C

Step A → B, C, D start immediately

Bottlenecks

Each step blocks the next

No waiting between parallel steps

Time to First Result

Must wait for all steps

Deploy as soon as first step completes

User Experience

Long wait, then results

Quick initial results, improving quality

Insight

Asynchronous execution is like a restaurant: as soon as the appetizer is ready, it goes to the table while the main course is still cooking. Customers don't wait for everything to be ready at once.

3.3. Discussion: Speed vs Quality Trade-offs

Early Models vs Final Models:

Week 1 Results

Simple Model A
• Food
• Car
2 entities
FAST delivery

Week 2 Results

Model A + B
• Food, Car
• Music, Sports
4 entities
GOOD quality

Week 3 Results

A + B + C
• Food, Car
• Music, Sports
• Tech, Art, Science
7 entities
BEST quality

Decision Framework:

Speed Priority: Deploy models as soon as available
Quality Priority: Wait for better models to complete
Balanced: Use progressive deployment with user feedback

Insight

Consider the Uber model: when you request a ride, you see the closest driver immediately (fast result), but the app continues searching for better options and may upgrade your match (quality improvement over time).

3.4. Exercises

Q: What causes each model training step to start?

A: Due to the variation in completion times for each model training step, the start of each following step depends on the completion of the previous step.
Q: Are steps blocking each other if they are running asynchronously?

A: No, asynchronous steps won't block each other.
Q: What do we need to consider when deciding whether to use any available trained model as early as possible?

A: We need to consider whether users prioritize seeing results faster or seeing better results. If the goal is early results, users may not get the quality they expect. If delays are acceptable, waiting for better models is preferable.

4. Step Memoization Pattern: Skipping Redundant Workloads

4.1. The Problem: Unnecessary Re-execution

Scenario 1 - Regular Data Updates:

Inefficient Re-execution with Data Updates

Week 1: New YouTube videos added

Data
Ingestion

→

Model
Training

→

Model
Serving

Week 2: More videos added

Data
Ingestion

→

Model
Training

→

Model
Serving

Scenario 2 - Model Experimentation:

Same Data Re-ingested for Different Models

Experiment 1: Try CNN architecture

Data
Ingestion
(SLOW!)

→

Model
Training
(CNN)

→

Model
Serving

Experiment 2: Try RNN architecture (Same data!)

Data
Ingestion
(SLOW!)

→

Model
Training
(RNN)

→

Model
Serving

4.2. The Solution: Intelligent Caching Strategies

Time-Based Caching:

▶

Workflow Triggered

New experiment started

→

Check Cache

Last updated: 1 week ago Window: 2 weeks

→

✓

Decision

Data is fresh (< 2 weeks) SKIP ingestion

→

⚡

Start Training

Use cached data directly

Content-Based Caching:

▶

Workflow Triggered

New training job

→

Check Cache

Cached: 1M videos Current: 2M videos

→

✗

Decision

Significant change (2x) RE-INGEST required

→

⚙

Full Ingestion

Process all new data

Step Memoization Implementation:

def execute_workflow():
    # Check cache before executing each step
    if should_skip_data_ingestion():
        data_location = get_cached_data_location()
    else:
        data_location = run_data_ingestion()
        cache_data_location(data_location)

    # Continue with remaining steps
    model = run_model_training(data_location)
    deploy_model(model)

def should_skip_data_ingestion():
    cache = get_cache()
    if cache_type == "time_based":
        return time_since_update < threshold
    elif cache_type == "content_based":
        return record_count_change < significant_threshold

Insight

Step memoization is like a smart GPS: it remembers which routes you've taken recently and their conditions. If the route hasn't changed significantly, it skips the expensive route calculation and uses the cached path.

4.3. Discussion: Cache Management Considerations

Cache Lifecycle Management:

Cache Growth Over Time

Daily Workflow Execution:
1,000 workflows × 100 cached steps = 100,000 caches/day

Day 1: 100K caches

Day 2: 200K caches

Day 3: 300K caches

Day 7: 700K caches

Day 30: 3M caches

Storage usage grows linearly! Need garbage collection!

Garbage Collection Strategy:

Record Timestamp

Track when cache is last used

↓

Periodic Scan

Scan all caches regularly

↓

Delete Unused

Remove caches unused for > threshold

Cache Content Strategy for Different Steps:

Cache Content by Step Type

Data Ingestion Cache

Metadata: Record count, last update timestampContent: Data location, schema versionValidation: Data quality metrics

Model Training Cache

Metadata: Model architecture, hyperparametersContent: Model artifacts, performance metricsValidation: Training dataset fingerprint

Model Serving Cache

Metadata: Model version, deployment configContent: Service endpoints, resource usageValidation: Performance benchmarks

Insight

Cache management is like maintaining a library: you need to periodically remove old, unused books to make space for new ones, while keeping frequently accessed materials easily available.

4.4. Exercises

Q: What type of steps can most benefit from step memoization?

A: Steps that are time-consuming or require a huge amount of computational resources.
Q: How do we tell whether a step's execution can be skipped?

A: We can use information stored in the cache, such as when the cache was initially created or metadata collected from the step, to decide whether we should skip the execution of a particular step.
Q: What do we need to manage and maintain once we've applied the pattern at scale?

A: We need to set up a garbage collection mechanism to recycle and delete created caches automatically.

5. Summary and Exercises

Key Concepts Mastered

🔄

Sequential/DAG Patterns

Linear execution flow
Step dependencies matter
Order is critical
No cycles allowed

🌟

Fan-in/Fan-out

Parallel task execution
Result merging
Independent workloads
Better resource utilization

⚡

Synchronous/Asynchronous

Concurrent execution
Non-blocking operations
Progressive results
Speed vs quality trade-offs

💾

Step Memoization

Cache results intelligently
Skip redundant work
Time & content-based caching
Garbage collection needed

Core Principles

Workflow Design: Connect ML components systematically using proven patterns
Parallel Execution: Use fan-in/fan-out for independent, time-consuming tasks
Async Optimization: Don't let slow steps block fast ones
Smart Caching: Avoid redundant computation through intelligent memoization

Real-World Applications

Video tagging systems with ensemble models
A/B testing frameworks with parallel experiments
Model pipeline optimization with cached intermediate results
Real-time inference with progressive model deployment

Insight

Master these four workflow patterns and you'll have the building blocks to design efficient, scalable ML systems that can handle the complexity demands of production environments while minimizing computational waste.

Practice Exercises

Pattern Recognition:

Identify which pattern to use when training 5 different models simultaneously
Design a caching strategy for a daily model retraining pipeline
Plan async deployment for models with 1-hour vs 8-hour training times

System Design: 4. Architect a workflow for A/B testing 3 recommendation algorithms 5. Design garbage collection for a high-frequency experimentation platform 6. Create a progressive deployment strategy for improving model quality over time

Previous: Chapter 4: Model Serving Patterns | Next: Chapter 6: Operation Patterns

Table of Contents​

1. What is Workflow?​

1.1. Sequential Workflows vs Directed Acyclic Graphs (DAGs)​

1.2. Understanding Workflow Complexity​

2. Fan-in and Fan-out Patterns: Composing Complex ML Workflows​

2.1. The Problem: Training Multiple Models​

2.2. The Solution: Systematic Pattern Application​

2.3. Discussion: When to Use These Patterns​

2.4. Exercises​

3. Synchronous and Asynchronous Patterns: Accelerating Workflows​

3.1. The Problem: Long-Running Step Bottlenecks​

3.2. The Solution: Concurrent Execution Strategies​

3.3. Discussion: Speed vs Quality Trade-offs​

3.4. Exercises​

4. Step Memoization Pattern: Skipping Redundant Workloads​

4.1. The Problem: Unnecessary Re-execution​

4.2. The Solution: Intelligent Caching Strategies​

4.3. Discussion: Cache Management Considerations​

4.4. Exercises​

5. Summary and Exercises​

Key Concepts Mastered​

Core Principles​

Real-World Applications​

Practice Exercises​

Table of Contents

1. What is Workflow?

1.1. Sequential Workflows vs Directed Acyclic Graphs (DAGs)

1.2. Understanding Workflow Complexity

2. Fan-in and Fan-out Patterns: Composing Complex ML Workflows

2.1. The Problem: Training Multiple Models

2.2. The Solution: Systematic Pattern Application

2.3. Discussion: When to Use These Patterns

2.4. Exercises

3. Synchronous and Asynchronous Patterns: Accelerating Workflows

3.1. The Problem: Long-Running Step Bottlenecks

3.2. The Solution: Concurrent Execution Strategies

3.3. Discussion: Speed vs Quality Trade-offs

3.4. Exercises

4. Step Memoization Pattern: Skipping Redundant Workloads

4.1. The Problem: Unnecessary Re-execution

4.2. The Solution: Intelligent Caching Strategies

4.3. Discussion: Cache Management Considerations

4.4. Exercises

5. Summary and Exercises

Key Concepts Mastered

Core Principles

Real-World Applications

Practice Exercises