Inference Routing
Akshi routes each LLM inference request to either a local model or a cloud provider. The routing decision is automatic, per-request, and based on a lightweight scoring model called Akshi Route.
Providers
| Provider | Type | Notes |
|---|---|---|
| Ollama | Local | Runs on the same machine or local network |
| Anthropic | Cloud | Direct API access |
| OpenRouter | Cloud gateway | Access to multiple model families |
Agents do not choose their provider. They call the infer host capability with
a prompt, and the router picks the best destination.
How routing works
Akshi Route is a logistic scoring model with 7 input features (prompt length, tool complexity, context size, and others). It produces a score between 0 and 1.
- Score >= threshold → route to cloud provider.
- Score < threshold → route to local Ollama.
- Default threshold: 0.55 (configurable in route profile).
The idea: simple requests stay local (fast, free, private), while complex requests go to more capable cloud models.
Fallback chain
The router uses a three-tier fallback to determine routing parameters when a full route profile is not configured:
- TinyLocal — hardcoded minimal profile for local inference.
- Profile — user-defined route profile from
runtime.toml. - Heuristic — feature-based scoring via the logistic model.
If local inference fails (Ollama unavailable, model not loaded), the router automatically falls back to a cloud provider.
Circuit breaker
Cloud providers can go down. The router includes a circuit breaker:
- Trigger: 3 consecutive failures to a cloud provider.
- Open duration: 30 seconds (requests skip the failed provider).
- Recovery: after the open period, one probe request tests the provider.
This prevents cascading failures when a cloud API has an outage.
Configuration
Route behavior is controlled through route profiles in runtime.toml. You can
set the scoring threshold, preferred providers, model overrides, and fallback
behavior per profile. See Router Configuration
for details.