Provider RoutingAdvanced Multi-Provider Management

Optimize request distribution across providers

LLMAPI.dev intelligently routes your requests to the most suitable providers for your selected models. By default, requests are distributed using a smart load balancing algorithm that optimizes for reliability and performance.

You can customize provider routing behavior by including the provider object in your request body for both Chat Completions and Completions endpoints.

Smart Load Balancing (Default Strategy)

LLMAPI.dev's default routing strategy employs intelligent load balancing across providers, with a focus on maintaining optimal performance. Our algorithm:

Continuously monitors provider health and availability
Prioritizes providers with consistent uptime over the past minute
Distributes requests among stable providers using a weighted approach that considers both price and performance
Maintains fallback options for seamless recovery if primary providers experience issues

When your request includes specialized parameters like tools or specific context length requirements, our system automatically filters for compatible providers.

Provider Sorting Options

While our default load balancing strategy works well for most use cases, you can explicitly prioritize specific attributes using the sort field in your provider preferences:

price

Prioritize the most cost-effective providers

throughput

Prioritize providers with the highest processing speed

latency

Prioritize providers with the lowest time-to-first-token

When you specify a sort preference, the system will try providers in sequence based on your chosen attribute rather than using the default load balancing algorithm.

Throughput Priority Example

{
  "provider": {
    "sort": "throughput"
  },
  "model": "openai/gpt-4",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Convenience Shortcuts

LLMAPI.dev offers convenient model suffixes as shortcuts for common routing preferences:

:speed

Append :speed to any model slug to prioritize throughput (equivalent to provider.sort: "throughput")

:economy

Append :economy to any model slug to prioritize lowest price (equivalent to provider.sort: "price")

Speed Shortcut Example

{
  "model": "openai/gpt-4:speed",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Provider Ordering

For granular control over provider selection, you can specify an ordered list of preferred providers using the order field. The system will attempt to use providers in the exact sequence you specify.

By default, if all specified providers are unavailable, the system will fall back to other compatible providers. You can disable this behavior by setting allow_fallbacks to false.

Provider Ordering Example

{
  "provider": {
    "order": ["anthropic", "openai", "mistral"],
    "allow_fallbacks": true
  },
  "model": "claude-3-opus",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Data Privacy Controls

LLMAPI.dev gives you control over how your data is handled by providers through the data_collection field:

allow

(default)

Permits routing to providers that may store data or use it for training

deny

Restricts routing to only providers with strict data privacy policies

You can also configure account-wide data privacy preferences in your account settings.

Data Privacy Example

{
  "provider": {
    "data_collection": "deny"
  },
  "model": "openai/gpt-4",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Additional Provider Filters

LLMAPI.dev offers several additional filtering options:

require_parameters

Ensures requests only go to providers that support all parameters in your request

only

Explicitly whitelist specific providers for your request

ignore

Exclude specific providers from consideration

Provider Filters Example

{
  "provider": {
    "only": ["openai", "anthropic"],
    "ignore": ["deepinfra"],
    "require_parameters": true
  },
  "model": "openai/gpt-4",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Quantization Control

For open-weight models that may be deployed with different quantization levels, you can specify your preference using the quantization field:

none

Only use non-quantized versions (highest quality, higher cost)

int8

Allow models quantized to 8-bit precision

int4

Allow models quantized to 4-bit precision (most efficient, potential quality tradeoff)

By default, LLMAPI.dev balances across all available quantization levels, prioritizing by price.

Quantization Control Example

{
  "provider": {
    "quantization": "none"
  },
  "model": "mistral/mistral-large",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ]
}

Provider Object Reference

Field	Type	Default	Description
sort	string	load_balanced	Prioritization strategy: 'price', 'throughput', 'latency'
order	string[]	null	Ordered list of preferred providers
allow_fallbacks	boolean	true	Allow fallback to other providers if specified ones fail
data_collection	string	allow	Data privacy preference: 'allow' or 'deny'
only	string[]	null	Whitelist specific providers
ignore	string[]	null	Exclude specific providers
require_parameters	boolean	false	Only use providers that support all request parameters
quantization	string	balanced	Quantization preference: 'none', 'int8', 'int4'