Parallelism

Enable Parallelism

# config.yaml
constraints:
  allow_parallelism:
    task: true    # Enable parallel subtasks
    data: true    # Enable parallel data extraction
    first_of_n: false  # First-result parallelism
  max_parallelism_nesting_depth: 1  # Maximum nesting depth

blastai serve --config config.yaml

Types of Parallelism

BLAST automates four types of parallelism:

1. Task Parallelism

Run multiple subtasks in parallel:

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="http://127.0.0.1:8000"
)

# This will launch subtasks that run in parallel
response = await client.responses.create(
    model="not-needed",
    input="Look up the largest gorilla in each continent",
    stream=True
)

2. Data Parallelism

One of the most time-consuming steps in web browsing AI is reading and summarizing content from the web. We parallelize this by chunking and and running a smaller LLM on each chunk. Our unscientific testing shows a 5x speedup with the parallelization and 2x speedup with smaller LLM without degrading the quality of the results.

3. First-of-N Parallelism

When enabled, BLAST runs multiple copies of each task in parallel and takes the first result that returns, early exiting the other tasks. This helps because browser-augmented LLMs have high variability in latency.

constraints:
  allow_parallelism:
    first_of_n: true  # Enable first-result parallelism

4. Nested Parallelism

Control how deep parallel tasks can nest:

constraints:
  max_parallelism_nesting_depth: 2  # Allow nested parallel tasks

Example nesting structure:

Parent Task (launched by user)
├── Subtask 1 (parallel)
│   ├── Sub-subtask 1.1 (parallel)
│   └── Sub-subtask 1.2 (parallel)
└── Subtask 2 (parallel)
    ├── Sub-subtask 2.1 (parallel)
    └── Sub-subtask 2.2 (parallel)

Monitor Parallel Execution

metrics = await engine.get_metrics()
print(f"Running tasks: {metrics['tasks']['running']}")
print(f"Active browsers: {metrics['concurrent_browsers']}")
print(f"Memory usage: {metrics['memory_usage_gb']} GB")

Model Selection

BLAST automatically selects between different models for parallel processing:

constraints:
  llm_model: "gpt-4.1"          # Primary model
  llm_model_mini: "gpt-4.1-mini"  # Model for data parallelism

Next Steps

Configure Settings for optimal performance
Learn about Caching
Understand Constraints for resource management

Get Started

Guides

Contributing

Types of Parallelism

1. Task Parallelism

2. Data Parallelism

3. First-of-N Parallelism

4. Nested Parallelism

Monitor Parallel Execution

Model Selection

Next Steps

Get Started

Guides

Contributing

​Types of Parallelism

​1. Task Parallelism

​2. Data Parallelism

​3. First-of-N Parallelism

​4. Nested Parallelism

​Monitor Parallel Execution

​Model Selection

​Next Steps

Types of Parallelism

1. Task Parallelism

2. Data Parallelism

3. First-of-N Parallelism

4. Nested Parallelism

Monitor Parallel Execution

Model Selection

Next Steps