Provider RoutingAdvanced Multi-Provider Management

Optimize request distribution across providers

LLMAPI.dev intelligently routes your requests to the most suitable providers for your selected models. By default, requests are distributed using a smart load balancing algorithm that optimizes for reliability and performance.

You can customize provider routing behavior by including the provider object in your request body for both Chat Completions and Completions endpoints.

Smart Load Balancing (Default Strategy)

LLMAPI.dev's default routing strategy employs intelligent load balancing across providers, with a focus on maintaining optimal performance. Our algorithm:

  1. Continuously monitors provider health and availability
  2. Prioritizes providers with consistent uptime over the past minute
  3. Distributes requests among stable providers using a weighted approach that considers both price and performance
  4. Maintains fallback options for seamless recovery if primary providers experience issues

Provider Sorting Options

While our default load balancing strategy works well for most use cases, you can explicitly prioritize specific attributes using the sort field in your provider preferences:

price

Prioritize the most cost-effective providers

throughput

Prioritize providers with the highest processing speed

latency

Prioritize providers with the lowest time-to-first-token

When you specify a sort preference, the system will try providers in sequence based on your chosen attribute rather than using the default load balancing algorithm.

Throughput Priority Example
{
  "provider": {
    "sort": "throughput"
  },
  "model": "openai/gpt-4",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Convenience Shortcuts

LLMAPI.dev offers convenient model suffixes as shortcuts for common routing preferences:

:speed

Append :speed to any model slug to prioritize throughput (equivalent to provider.sort: "throughput")

:economy

Append :economy to any model slug to prioritize lowest price (equivalent to provider.sort: "price")

Speed Shortcut Example
{
  "model": "openai/gpt-4:speed",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Provider Ordering

For granular control over provider selection, you can specify an ordered list of preferred providers using the order field. The system will attempt to use providers in the exact sequence you specify.

By default, if all specified providers are unavailable, the system will fall back to other compatible providers. You can disable this behavior by setting allow_fallbacks to false.

Provider Ordering Example
{
  "provider": {
    "order": ["anthropic", "openai", "mistral"],
    "allow_fallbacks": true
  },
  "model": "claude-3-opus",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Data Privacy Controls

LLMAPI.dev gives you control over how your data is handled by providers through the data_collection field:

allow
(default)

Permits routing to providers that may store data or use it for training

deny

Restricts routing to only providers with strict data privacy policies

Data Privacy Example
{
  "provider": {
    "data_collection": "deny"
  },
  "model": "openai/gpt-4",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Additional Provider Filters

LLMAPI.dev offers several additional filtering options:

require_parameters

Ensures requests only go to providers that support all parameters in your request

only

Explicitly whitelist specific providers for your request

ignore

Exclude specific providers from consideration

Provider Filters Example
{
  "provider": {
    "only": ["openai", "anthropic"],
    "ignore": ["deepinfra"],
    "require_parameters": true
  },
  "model": "openai/gpt-4",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Quantization Control

For open-weight models that may be deployed with different quantization levels, you can specify your preference using the quantization field:

none

Only use non-quantized versions (highest quality, higher cost)

int8

Allow models quantized to 8-bit precision

int4

Allow models quantized to 4-bit precision (most efficient, potential quality tradeoff)

Quantization Control Example
{
  "provider": {
    "quantization": "none"
  },
  "model": "mistral/mistral-large",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Provider Object Reference

FieldTypeDefaultDescription
sort
string
load_balancedPrioritization strategy: 'price', 'throughput', 'latency'
order
string[]
nullOrdered list of preferred providers
allow_fallbacks
boolean
trueAllow fallback to other providers if specified ones fail
data_collection
string
allowData privacy preference: 'allow' or 'deny'
only
string[]
nullWhitelist specific providers
ignore
string[]
nullExclude specific providers
require_parameters
boolean
falseOnly use providers that support all request parameters
quantization
string
balancedQuantization preference: 'none', 'int8', 'int4'