Enable Parallelism

# config.yaml
constraints:
  allow_parallelism:
    task: true    # Enable parallel subtasks
    data: true    # Enable parallel data extraction
    first_of_n: false  # First-result parallelism
  max_parallelism_nesting_depth: 1  # Maximum nesting depth
blastai serve --config config.yaml

Types of Parallelism

BLAST automates four types of parallelism:

1. Task Parallelism

Run multiple subtasks in parallel:

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="http://127.0.0.1:8000"
)

# This will launch subtasks that run in parallel
response = await client.responses.create(
    model="not-needed",
    input="Look up the largest gorilla in each continent",
    stream=True
)

2. Data Parallelism

One of the most time-consuming steps in web browsing AI is reading and summarizing content from the web. We parallelize this by chunking and and running a smaller LLM on each chunk. Our unscientific testing shows a 5x speedup with the parallelization and 2x speedup with smaller LLM without degrading the quality of the results.

3. First-of-N Parallelism

When enabled, BLAST runs multiple copies of each task in parallel and takes the first result that returns, early exiting the other tasks. This helps because browser-augmented LLMs have high variability in latency.

constraints:
  allow_parallelism:
    first_of_n: true  # Enable first-result parallelism

4. Nested Parallelism

Control how deep parallel tasks can nest:

constraints:
  max_parallelism_nesting_depth: 2  # Allow nested parallel tasks

Example nesting structure:

Parent Task (launched by user)
├── Subtask 1 (parallel)
│   ├── Sub-subtask 1.1 (parallel)
│   └── Sub-subtask 1.2 (parallel)
└── Subtask 2 (parallel)
    ├── Sub-subtask 2.1 (parallel)
    └── Sub-subtask 2.2 (parallel)

Monitor Parallel Execution

metrics = await engine.get_metrics()
print(f"Running tasks: {metrics['tasks']['running']}")
print(f"Active browsers: {metrics['concurrent_browsers']}")
print(f"Memory usage: {metrics['memory_usage_gb']} GB")

Model Selection

BLAST automatically selects between different models for parallel processing:

constraints:
  llm_model: "gpt-4.1"          # Primary model
  llm_model_mini: "gpt-4.1-mini"  # Model for data parallelism

Next Steps