Provider RoutingAdvanced Multi-Provider Management
Optimize request distribution across providers
LLMAPI.dev intelligently routes your requests to the most suitable providers for your selected models. By default, requests are distributed using a smart load balancing algorithm that optimizes for reliability and performance.
You can customize provider routing behavior by including the provider
object in your request body for both Chat Completions and Completions endpoints.
Smart Load Balancing (Default Strategy)
LLMAPI.dev's default routing strategy employs intelligent load balancing across providers, with a focus on maintaining optimal performance. Our algorithm:
- Continuously monitors provider health and availability
- Prioritizes providers with consistent uptime over the past minute
- Distributes requests among stable providers using a weighted approach that considers both price and performance
- Maintains fallback options for seamless recovery if primary providers experience issues
tools
or specific context length requirements, our system automatically filters for compatible providers.Provider Sorting Options
While our default load balancing strategy works well for most use cases, you can explicitly prioritize specific attributes using the sort
field in your provider
preferences:
Prioritize the most cost-effective providers
Prioritize providers with the highest processing speed
Prioritize providers with the lowest time-to-first-token
When you specify a sort preference, the system will try providers in sequence based on your chosen attribute rather than using the default load balancing algorithm.
{
"provider": {
"sort": "throughput"
},
"model": "openai/gpt-4",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
]
}
Convenience Shortcuts
LLMAPI.dev offers convenient model suffixes as shortcuts for common routing preferences:
Append :speed
to any model slug to prioritize throughput (equivalent to provider.sort: "throughput"
)
Append :economy
to any model slug to prioritize lowest price (equivalent to provider.sort: "price"
)
{
"model": "openai/gpt-4:speed",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
]
}
Provider Ordering
For granular control over provider selection, you can specify an ordered list of preferred providers using the order
field. The system will attempt to use providers in the exact sequence you specify.
By default, if all specified providers are unavailable, the system will fall back to other compatible providers. You can disable this behavior by setting allow_fallbacks
to false
.
{
"provider": {
"order": ["anthropic", "openai", "mistral"],
"allow_fallbacks": true
},
"model": "claude-3-opus",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
]
}
Data Privacy Controls
LLMAPI.dev gives you control over how your data is handled by providers through the data_collection
field:
Permits routing to providers that may store data or use it for training
Restricts routing to only providers with strict data privacy policies
{
"provider": {
"data_collection": "deny"
},
"model": "openai/gpt-4",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
]
}
Additional Provider Filters
LLMAPI.dev offers several additional filtering options:
Ensures requests only go to providers that support all parameters in your request
Explicitly whitelist specific providers for your request
Exclude specific providers from consideration
{
"provider": {
"only": ["openai", "anthropic"],
"ignore": ["deepinfra"],
"require_parameters": true
},
"model": "openai/gpt-4",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
]
}
Quantization Control
For open-weight models that may be deployed with different quantization levels, you can specify your preference using the quantization
field:
Only use non-quantized versions (highest quality, higher cost)
Allow models quantized to 8-bit precision
Allow models quantized to 4-bit precision (most efficient, potential quality tradeoff)
{
"provider": {
"quantization": "none"
},
"model": "mistral/mistral-large",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
]
}
Provider Object Reference
Field | Type | Default | Description |
---|---|---|---|
sort | string | load_balanced | Prioritization strategy: 'price', 'throughput', 'latency' |
order | string[] | null | Ordered list of preferred providers |
allow_fallbacks | boolean | true | Allow fallback to other providers if specified ones fail |
data_collection | string | allow | Data privacy preference: 'allow' or 'deny' |
only | string[] | null | Whitelist specific providers |
ignore | string[] | null | Exclude specific providers |
require_parameters | boolean | false | Only use providers that support all request parameters |
quantization | string | balanced | Quantization preference: 'none', 'int8', 'int4' |