Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Inference Routing

Akshi routes each LLM inference request to either a local model or a cloud provider. The routing decision is automatic, per-request, and based on a lightweight scoring model called Akshi Route.

Providers

ProviderTypeNotes
OllamaLocalRuns on the same machine or local network
AnthropicCloudDirect API access
OpenRouterCloud gatewayAccess to multiple model families

Agents do not choose their provider. They call the infer host capability with a prompt, and the router picks the best destination.

How routing works

Akshi Route is a logistic scoring model with 7 input features (prompt length, tool complexity, context size, and others). It produces a score between 0 and 1.

  • Score >= threshold → route to cloud provider.
  • Score < threshold → route to local Ollama.
  • Default threshold: 0.55 (configurable in route profile).

The idea: simple requests stay local (fast, free, private), while complex requests go to more capable cloud models.

Fallback chain

The router uses a three-tier fallback to determine routing parameters when a full route profile is not configured:

  1. TinyLocal — hardcoded minimal profile for local inference.
  2. Profile — user-defined route profile from runtime.toml.
  3. Heuristic — feature-based scoring via the logistic model.

If local inference fails (Ollama unavailable, model not loaded), the router automatically falls back to a cloud provider.

Circuit breaker

Cloud providers can go down. The router includes a circuit breaker:

  • Trigger: 3 consecutive failures to a cloud provider.
  • Open duration: 30 seconds (requests skip the failed provider).
  • Recovery: after the open period, one probe request tests the provider.

This prevents cascading failures when a cloud API has an outage.

Configuration

Route behavior is controlled through route profiles in runtime.toml. You can set the scoring threshold, preferred providers, model overrides, and fallback behavior per profile. See Router Configuration for details.