Parallelism
BLAST is automatically parallel
Enable Parallelism
Types of Parallelism
BLAST automates four types of parallelism:
1. Task Parallelism
Run multiple subtasks in parallel:
2. Data Parallelism
One of the most time-consuming steps in web browsing AI is reading and summarizing content from the web. We parallelize this by chunking and and running a smaller LLM on each chunk. Our unscientific testing shows a 5x speedup with the parallelization and 2x speedup with smaller LLM without degrading the quality of the results.
3. First-of-N Parallelism
When enabled, BLAST runs multiple copies of each task in parallel and takes the first result that returns, early exiting the other tasks. This helps because browser-augmented LLMs have high variability in latency.
4. Nested Parallelism
Control how deep parallel tasks can nest:
Example nesting structure:
Monitor Parallel Execution
Model Selection
BLAST automatically selects between different models for parallel processing:
Next Steps
- Configure Settings for optimal performance
- Learn about Caching
- Understand Constraints for resource management