from openai import OpenAIclient = OpenAI( api_key="not-needed", base_url="http://127.0.0.1:8000")# This will launch subtasks that run in parallelresponse = await client.responses.create( model="not-needed", input="Look up the largest gorilla in each continent", stream=True)
One of the most time-consuming steps in web browsing AI is reading and summarizing content from the web. We parallelize this by chunking and and running a smaller LLM on each chunk. Our unscientific testing shows a 5x speedup with the parallelization and 2x speedup with smaller LLM without degrading the quality of the results.
When enabled, BLAST runs multiple copies of each task in parallel and takes the first result that returns, early exiting the other tasks. This helps because browser-augmented LLMs have high variability in latency.