Start the Server

# Run the engine + web UI
blastai serve

# Run only the engine
blastai serve engine

# Run only the web UI
blastai serve web

# Run the CLI
blastai serve cli

You can then use any OpenAI API client:

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="http://127.0.0.1:8000"
)

Chat Completions API

1. Basic Usage

response = client.chat.completions.create(
    model="not-needed",
    messages=[
        {"role": "user", "content": "Find the 10 heaviest gorillas"}
    ]
)
print(response.choices[0].message.content)

2. Streaming

Enable streaming to receive real-time updates:

response = client.chat.completions.create(
    model="not-needed",
    messages=[
        {"role": "user", "content": "Find the 10 heaviest gorillas"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

The streaming response includes:

  1. Initial role message (role: "assistant")
  2. Each chunk’s delta.content contains either:
    • Thought (if the string contains " ")
    • Screenshot (no spaces in content)
    • Final result
  3. Final chunk with finish_reason: "stop"

3. Conversation

BLAST lets you run multi-turn conversations. The engine’s prefix caching ensures that already-computed browser actions are not repeated unless needed.

response = client.chat.completions.create(
    model="not-needed",
    messages=[
        {"role": "user", "content": "Go to python.org"},
        {"role": "assistant", "content": "I've navigated to python.org"},
        {"role": "user", "content": "Click on Documentation"}
    ]
)

4. Caching

BLAST will by default cache both results and the LLM-generated steps to create those results (in the case of queries that have results that change but the steps to access to new result value doesn’t).

Control caching behavior with cache_control options:

response = client.chat.completions.create(
    model="not-needed",
    messages=[
        {
            "role": "user",
            "content": "Find the 10 heaviest gorillas",
            "cache_control": "no-cache,no-store"
        }
    ]
)

Available cache control options:

  • no-cache - Skip results cache lookup
  • no-store - Don’t store in results cache
  • no-cache-plan - Skip plan cache lookup
  • no-store-plan - Don’t store plan in cache

Responses API

1. Basic Usage

response = client.responses.create(
    model="not-needed",
    input="Find the 10 heaviest gorillas"
)
print(response.output[0].content[0].text)

2. Streaming

Enable streaming to receive detailed event updates:

stream = client.responses.create(
    model="not-needed",
    input="Find the 10 heaviest gorillas",
    stream=True
)

for event in stream:
    if event.type == "response.output_text.delta":
        # Skip screenshots (no spaces in delta)
        if ' ' in event.delta:
            print(event.delta, end="", flush=True)

BLAST emits a sequence of events during streaming:

  1. response.created - Initial event when the response is created
  2. response.in_progress - Task processing has started
  3. response.output_text.delta - Each streaming event’s delta is either:
    • Thought (if the content contains " ")
    • Screenshot (no spaces in content)
  4. response.output_text.done - An event is complete
  5. response.completed - Indicates all events are sent.

3. Conversation

Support for stateful conversations using previous response IDs:

# First response
response1 = client.responses.create(
    model="not-needed",
    input="Go to python.org"
)

# Follow-up using previous response ID
response2 = client.responses.create(
    model="not-needed",
    input="Click on Documentation",
    previous_response_id=response1.id
)

4. Caching

response = client.responses.create(
    model="not-needed",
    input="Find the 10 heaviest gorillas",
    cache_control="no-cache,no-store"
)

Next Steps