Caching

Why is this needed?

BLAST automatically maintains a prefix cache, similar to most LLM serving engines. The difference is that browser-augmented LLM prefix cache must be aware of the underlying browser resources required to reuse cache and continue execution.

Caching Options

""              # Cache everything (default)
"no-cache"      # Skip results cache lookup
"no-store"      # Don't store results in cache
"no-cache-plan" # Skip plan cache lookup
"no-store-plan" # Don't store plan in cache

Options can be combined:

"no-cache,no-store"           # Skip all caching
"no-cache-plan,no-store-plan" # Skip plan cache but use result cache
"no-store,no-store-plan"      # Skip storing but check cache

Using Cache Control

1. OpenAI-Compatible API

Using /chat/completions:

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="http://127.0.0.1:8000"
)

# With cache control
response = client.chat.completions.create(
    model="not-needed",
    messages=[{
        "role": "user",
        "content": "Search Python docs",
        "cache_control": "no-cache,no-store"  # Skip all caching
    }]
)

# Default caching
response = client.chat.completions.create(
    model="not-needed",
    messages=[{
        "role": "user",
        "content": "Search Python docs"
    }]
)

Using /responses:

# With cache control
response = client.responses.create(
    model="not-needed",
    input="Search Python docs",
    cache_control="no-cache-plan"  # Skip plan cache
)

# Clear specific response cache
client.responses.delete("resp_<task_id>")

# Clear by task description
client.responses.delete("Search Python docs")

2. Engine API

from blastai import Engine

async with Engine() as engine:
    # With cache control
    result = await engine.run(
        "Search Python docs",
        cache_control="no-cache,no-store"
    )

    # Multiple tasks with different cache settings
    results = await engine.run([
        "Search Python docs",  # Uses default caching
        "Click Documentation"  # Uses default caching
    ], cache_control=["no-cache", ""])  # Different settings per task

Cache Persistence

Enable cache persistence in settings:

# config.yaml
settings:
  persist_cache: true  # Keep cache between engine sessions

When persistence is enabled:

Results are stored in <appdata>/cache/results/
Plans are stored in <appdata>/cache/plans/
Cache survives between engine restarts

Clearing Cache

1. Through API

# Clear by response ID
client.responses.delete("resp_<task_id>")
client.responses.delete("chatcmpl-<task_id>")

# Clear by task description
client.responses.delete("Search Python docs")

2. Through Engine

Clear all caches:

engine = await Engine.create()
await engine.cache_manager.clear()

3. Manually

Remove cache directories:

# Remove all cache
rm -rf ~/.local/share/blast/cache/*

# Remove specific caches
rm -rf ~/.local/share/blast/cache/results/*  # Clear results
rm -rf ~/.local/share/blast/cache/plans/*    # Clear plans

Next Steps

Configure Settings
Learn about Parallelism
Understand Constraints for resource management

Get Started

Guides

Contributing

Why is this needed?

Caching Options

Using Cache Control

1. OpenAI-Compatible API

2. Engine API

Cache Persistence

Clearing Cache

1. Through API

2. Through Engine

3. Manually

Next Steps

Get Started

Guides

Contributing

​Why is this needed?

​Caching Options

​Using Cache Control

​1. OpenAI-Compatible API

​2. Engine API

​Cache Persistence

​Clearing Cache

​1. Through API

​2. Through Engine

​3. Manually

​Next Steps

Why is this needed?

Caching Options

Using Cache Control

1. OpenAI-Compatible API

2. Engine API

Cache Persistence

Clearing Cache

1. Through API

2. Through Engine

3. Manually

Next Steps