Model Providers
Overview
Inspect has support for a wide variety of language model APIs and can be extended to support arbitrary additional ones. Support for the following providers is built in to Inspect:
Lab APIs | OpenAI, Anthropic, Google, Grok, Mistral |
Cloud APIs | AWS Bedrock, Azure AI, Vertex AI |
Open (Hosted) | Groq, Together AI, Cloudflare, Goodfire |
Open (Local) | Hugging Face, vLLM, Ollama, Lllama-cpp-python |
If the provider you are using is not listed above, you may still be able to use it if:
It is available via OpenRouter (see the docs on using OpenRouter with Inspect).
It provides an OpenAI compatible API endpoint. In this scenario, use the Inspect OpenAI interface and set the
OPENAI_BASE_URL
environment variable to the apprpriate value for your provider.
You can also create Model API Extensions to add model providers using their native interface.
OpenAI
To use the OpenAI provider, install the openai
package, set your credentials, and specify a model using the --model
option:
pip install openai
export OPENAI_API_KEY=your-openai-api-key
inspect eval arc.py --model openai/gpt-4o-mini
For the openai
provider, custom model args (-M
) are forwarded to the constructor of the AsyncOpenAI
class.
The following environment variables are supported by the OpenAI provider
Variable | Description |
---|---|
OPENAI_API_KEY |
API key credentials (required). |
OPENAI_BASE_URL |
Base URL for requests (optional, defaults to https://api.openai.com/v1 ) |
OPENAI_ORG_ID |
OpenAI organization ID (optional) |
OPENAI_PROJECT_ID |
OpenAI project ID (optional) |
OpenAI on Azure
To use OpenAI models on Azure AI, specify an AZUREAI_OPENAI_API_KEY
along with an AZUREAI_OPENAI_BASE_URL
. You can then use the normal openai
provider with the azure
qualifier, specifying a model name that corresponds to the Azure Deployment Name of your model. For example, if your deployed model name was gpt4-1106-preview-ythre:
export AZUREAI_OPENAI_API_KEY=key
export AZUREAI_OPENAI_BASE_URL=https://your-url-at.azure.com
inspect eval --model openai/azure/gpt4-1106-preview-ythre
In addition to these variables, you can also set the OPENAI_API_VERSION
environment variable to specify a specific version of the OpenAI interface on Azure.
Anthropic
To use the Anthropic provider, install the anthropic
package, set your credentials, and specify a model using the --model
option:
pip install anthropic
export ANTHROPIC_API_KEY=your-anthropic-api-key
inspect eval arc.py --model anthropic/claude-3-5-sonnet-latest
For the anthropic
provider, custom model args (-M
) are forwarded to the constructor of the AsyncAnthropic
class.
The following environment variables are supported by the Anthropic provider
Variable | Description |
---|---|
ANTHROPIC_API_KEY |
API key credentials (required). |
ANTHROPIC_BASE_URL |
Base URL for requests (optional, defaults to https://api.anthropic.com ) |
Anthropic on AWS Bedrock
To use Anthropic models on Bedrock, use the normal anthropic
provider with the bedrock
qualifier, specifying a model name that corresponds to a model you have access to on Bedrock. For Bedrock, authentication is not handled using an API key but rather your standard AWS credentials (e.g. AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
). You should also be sure to have specified an AWS region. For example:
export AWS_ACCESS_KEY_ID=your-aws-access-key-id
export AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key
export AWS_DEFAULT_REGION=us-east-1
inspect eval arc.py --model anthropic/bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0
You can also optionally set the ANTHROPIC_BEDROCK_BASE_URL
environment variable to set a custom base URL for Bedrock API requests.
Anthropic on Vertex AI
To use Anthropic models on Vertex, you can use the standard anthropic
model provider with the vertex
qualifier (e.g. anthropic/vertex/claude-3-5-sonnet-v2@20241022
). You should also set two environment variables indicating your project ID and region. Here is a complete example:
export ANTHROPIC_VERTEX_PROJECT_ID=project-12345
export ANTHROPIC_VERTEX_REGION=us-east5
inspect eval ctf.py --model anthropic/vertex/claude-3-5-sonnet-v2@20241022
Authentication is doing using the standard Google Cloud CLI (i.e. if you have authorised the CLI then no additional auth is needed for the model API).
To use the Google provider, install the google-generativeai
package, set your credentials, and specify a model using the --model
option:
pip install google-generativeai
export GOOGLE_API_KEY=your-google-api-key
inspect eval arc.py --model google/gemini-1.5-pro
For the google
provider, custom model args (-M
) are forwarded to the genai.configure
function.
The following environment variables are supported by the Google provider
Variable | Description |
---|---|
GOOGLE_API_KEY |
API key credentials (required). |
GOOGLE_BASE_URL |
Base URL for requests (optional) |
Safety Settings
Google models make available safety settings that you can adjust to determine what sorts of requests will be handled (or refused) by the model. The four categories of safety settings are as follows:
Category | Description |
---|---|
sexually_explicit |
Contains references to sexual acts or other lewd content. |
hate_speech |
Content that is rude, disrespectful, or profane. |
harassment |
Negative or harmful comments targeting identity and/or protected attributes. |
dangerous_content |
Promotes, facilitates, or encourages harmful acts. |
For each category, the following block thresholds are available:
Block Threshold | Description |
---|---|
none |
Always show regardless of probability of unsafe content |
only_high |
Block when high probability of unsafe content |
medium_and_above |
Block when medium or high probability of unsafe content |
low_and_above |
Block when low, medium or high probability of unsafe content |
By default, Inspect sets all four categories to none
(enabling all content). You can override these defaults by using the safety_settings
model argument. For example:
= dict(
safety_settings = "medium_and_above",
dangerous_content = "low_and_above"
hate_speech
)eval(
"eval.py",
=dict(safety_settings=safety_settings)
model_args )
This also can be done from the command line:
inspect eval eval.py -M "safety_settings={'hate_speech': 'low_and_above'}"
Mistral
To use the Mistral provider, install the mistral
package, set your credentials, and specify a model using the --model
option:
pip install mistral
export MISTRAL_API_KEY=your-mistral-api-key
inspect eval arc.py --model mistral/mistral-large-latest
For the mistral
provider, custom model args (-M
) are forwarded to the constructor of the Mistral
class.
The following environment variables are supported by the Mistral provider
Variable | Description |
---|---|
MISTRAL_API_KEY |
API key credentials (required). |
MISTRAL_BASE_URL |
Base URL for requests (optional, defaults to https://api.mistral.ai ) |
Mistral on Azure AI
To use Mistral models on Azure AI, specify an AZURE_MISTRAL_API_KEY
along with an AZUREAI_MISTRAL_BASE_URL
. You can then use the normal mistral
provider, but you’ll need to specify a model name that corresponds to the Azure Deployment Name of your model. For example, if your deployment model name was mistral-large-ctwi:
export AZUREAI_MISTRAL_API_KEY=key
export AZUREAI_MISTRAL_BASE_URL=https://your-url-at.azure.com
inspect eval --model mistral/mistral-large-ctwi
Grok
To use the Grok provider, install the openai
package (which the Grok service provides a compatible backend for), set your credentials, and specify a model using the --model
option:
pip install openai
export GROK_API_KEY=your-grok-api-key
inspect eval arc.py --model grok/grok-beta
For the grok
provider, custom model args (-M
) are forwarded to the constructor of the AsyncOpenAI
class.
The following environment variables are supported by the Grok provider
Variable | Description |
---|---|
GROK_API_KEY |
API key credentials (required). |
GROK_BASE_URL |
Base URL for requests (optional, defaults to https://api.x.ai/v1 ) |
AWS Bedrock
To use the AWS Bedrock provider, install the aioboto3
package, set your credentials, and specify a model using the --model
option:
export AWS_ACCESS_KEY_ID=access-key-id
export AWS_SECRET_ACCESS_KEY=secret-access-key
export AWS_DEFAULT_REGION=us-east-1
inspect eval bedrock/meta.llama2-70b-chat-v1
For the bedrock
provider, custom model args (-M
) are forwarded to the client
method of the aioboto3.Session
class.
Note that all models on AWS Bedrock require that you request model access before using them in a deployment (in some cases access is granted immediately, in other cases it could one or more days).
You should be also sure that you have the appropriate AWS credentials before accessing models on Bedrock. You aren’t likely to need to, but you can also specify a custom base URL for AWS Bedrock using the BEDROCK_BASE_URL
environment variable.
If you are using Anthropic models on Bedrock, you can alternatively use the Anthropic provider as your means of access.
Azure AI
To use the Azure AI provider, install the azure-ai-inference
package, set your credentials and base URL, and specify an Azure Deployment Name as the model name:
pip install azure-ai-inference
export AZUREAI_API_KEY=api-key
export AZUREAI_BASE_URL=https://your-url-at.azure.com
$ inspect eval --model azureai/llama-2-70b-chat-wnsnw
For the azureai
provider, custom model args (-M
) are forwarded to the constructor of the ChatCompletionsClient
class.
The following environment variables are supported by the Azure AI provider
Variable | Description |
---|---|
AZUREAI_API_KEY |
API key credentials (required). |
AZUREAI_BASE_URL |
Base URL for requests (required) |
If you are using Open AI or Mistral on Azure AI, you can alternatively use the OpenAI provider or Mistral provider as your means of access.
Tool Emulation
When using the azureai
model provider, tool calling support can be ‘emulated’ for models that Azure AI has not yet implemented tool calling for. This occurs by default for Llama models. For other models, use the emulate_tools
model arg to force tool emulation:
inspect eval ctf.py -M emulate_tools=true
You can also use this option to disable tool emulation for Llama models with emulate_tools=false
.
Vertex AI
Vertex AI is a distinct service from Google AI, see a comparison matrix here. Make sure you are using the appropriate model provider.
To use the Vertex AI provider, install the google-cloud-aiplatform
package, configure your environment for Vertex API access, and specify a model using the --model
option:
inspect eval eval.py --model vertex/gemini-1.5-flash
The core libraries for Vertex AI interact directly with Google Cloud Platform so this provider doesn’t use the standard BASE_URL
/API_KEY
approach that others do. Consequently you don’t need to set these environment variables.
Vertex AI also provides the same safety_settings
outlined in the Google provider.
If you are using Anthropic on Vertex AI, you can alternatively use the Anthropic provider as your means of access.
Together AI
To use the Together AI provider, install the openai
package (which the Together AI service provides a compatible backend for), set your credentials, and specify a model using the --model
option:
pip install openai
export TOGETHER_API_KEY=your-together-api-key
inspect eval arc.py --model together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
For the together
provider, custom model args (-M
) are forwarded to the constructor of the AsyncOpenAI
class.
The following environment variables are supported by the Together AI provider
Variable | Description |
---|---|
TOGETHER_API_KEY |
API key credentials (required). |
TOGETHER_BASE_URL |
Base URL for requests (optional, defaults to https://api.together.xyz/v1 ) |
Groq
To use the Groq provider, install the groq
package, set your credentials, and specify a model using the --model
option:
pip install groq
export GROQ_API_KEY=your-groq-api-key
inspect eval arc.py --model groq/llama-3.1-70b-versatile
For the groq
provider, custom model args (-M
) are forwarded to the constructor of the AsyncGroq
class.
The following environment variables are supported by the Groq provider
Variable | Description |
---|---|
GROQ_API_KEY |
API key credentials (required). |
GROQ_BASE_URL |
Base URL for requests (optional, defaults to https://api.groq.com ) |
Cloudflare
To use the Cloudflare provider, set your account id and access token, and specify a model using the --model
option:
export CLOUDFLARE_ACCOUNT_ID=account-id
export CLOUDFLARE_API_TOKEN=api-token
inspect eval arc.py --model cf/meta/llama-3.1-70b-instruct
For the cloudflare
provider, custom model args (-M
) are included as fields in the post body of the chat request.
The following environment variables are supported by the Cloudflare provider:
Variable | Description |
---|---|
CLOUDFLARE_ACCOUNT_ID |
Account id (required). |
CLOUDFLARE_API_TOKEN |
API key credentials (required). |
CLOUDFLARE_BASE_URL |
Base URL for requests (optional, defaults to https://api.cloudflare.com/client/v4/accounts ) |
Goodfire
To use the Goodfire provider, install the goodfire
package, set your credentials, and specify a model using the --model
option:
pip install goodfire
export GOODFIRE_API_KEY=your-goodfire-api-key
inspect eval arc.py --model goodfire/meta-llama/Meta-Llama-3.1-8B-Instruct
For the goodfire
provider, custom model args (-M
) are forwarded to chat.completions.create
method of the AsyncClient
class.
The following environment variables are supported by the Goodfire provider
Variable | Description |
---|---|
GOODFIRE_API_KEY |
API key credentials (required). |
GOODFIRE_BASE_URL |
Base URL for requests (optional, defaults to https://api.goodfire.ai ) |
Hugging Face
The Hugging Face provider implements support for local models using the transformers package. To use the Hugging Face provider, install the torch
, transformers
, and accelerate
packages and specify a model using the --model
option:
pip install torch transformers accelerate
inspect eval arc.py --model hf/openai-community/gpt2
Batching
Concurrency for REST API based models is managed using the max_connections
option. The same option is used for transformers
inference—up to max_connections
calls to generate()
will be batched together (note that batches will proceed at a smaller size if no new calls to generate()
have occurred in the last 2 seconds).
The default batch size for Hugging Face is 32, but you should tune your max_connections
to maximise performance and ensure that batches don’t exceed available GPU memory. The Pipeline Batching section of the transformers documentation is a helpful guide to the ways batch size and performance interact.
Device
The PyTorch cuda
device will be used automatically if CUDA is available (as will the Mac OS mps
device). If you want to override the device used, use the device
model argument. For example:
$ inspect eval arc.py --model hf/openai-community/gpt2 -M device=cuda:0
This also works in calls to eval()
:
eval("arc.py", model="hf/openai-community/gpt2", model_args=dict(device="cuda:0"))
Or in a call to get_model()
= get_model("hf/openai-community/gpt2", device="cuda:0") model
Local Models
In addition to using models from the Hugging Face Hub, the Hugging Face provider can also use local model weights and tokenizers (e.g. for a locally fine tuned model). Use hf/local
along with the model_path
, and (optionally) tokenizer_path
arguments to select a local model. For example, from the command line, use the -M
flag to pass the model arguments:
$ inspect eval arc.py --model hf/local -M model_path=./my-model
Or using the eval()
function:
eval("arc.py", model="hf/local", model_args=dict(model_path="./my-model"))
Or in a call to get_model()
= get_model("hf/local", model_path="./my-model") model
vLLM
The vLLM provider also implements support for Hugging Face models using the vllm package. To use the vLLM provider, install the vllm
package and specify a model using the --model
option:
pip install vllm
inspect eval arc.py --model vllm/openai-community/gpt2
You can also access models from ModelScope rather than Hugging Face, see the vLLM documentation for details on this.
vLLM is generally much faster than the Hugging Face provider as the library is designed entirely for inference speed whereas the Hugging Face library is more general purpose.
Rather than doing inference locally, you can also connect to a remote vLLM server. See the section below on vLLM Server for details).
Batching
vLLM automatically handles batching, so you generally don’t have to worry about selecting the optimal batch size. However, you can still use the max_connections
option to control the number of concurrent requests which defaults to 32.
Device
The device
option is also available for vLLM models, and you can use it to specify the device(s) to run the model on. For example:
$ inspect eval arc.py --model vllm/meta-llama/Meta-Llama-3-8B-Instruct -M device='0,1,2,3'
Local Models
Similar to the Hugging Face provider, you can also use local models with the vLLM provider. Use vllm/local
along with the model_path
, and (optionally) tokenizer_path
arguments to select a local model. For example, from the command line, use the -M
flag to pass the model arguments:
$ inspect eval arc.py --model vllm/local -M model_path=./my-model
vLLM Server
vLLM provides an HTTP server that implements OpenAI’s Chat API. To use this with Inspect, use the openai
provider rather than the vllm
provider, setting the model base URL to point to the vLLM server rather than OpenAI. For example:
$ export OPENAI_BASE_URL=http://localhost:8080/v1
$ export OPENAI_API_KEY=<your-server-api-key>
$ inspect eval arc.py --model openai/meta-llama/Meta-Llama-3-8B-Instruct
See the vLLM documentation on Server Mode for additional details.
Ollama
To use the Ollama provider, install the openai
package (which Ollama provides a compatible backend for) and specify a model using the --model
option:
pip install openai
inspect eval arc.py --model ollama/llama3.1
Note that you should be sure that Ollama is running on your system berore using it with Inspect.
The following environment variables are supported by the Ollma provider
Variable | Description |
---|---|
OLLAMA_BASE_URL |
Base URL for requests (optional, defaults to http://localhost:11434/v1 ) |
Llama-cpp-python
To use the Llama-cpp-python provider, install the openai
package (which llama-cpp-python provides a compatible backend for) and specify a model using the --model
option:
pip install openai
inspect eval arc.py --model llama-cpp-python/llama3
Note that you should be sure that the llama-cpp-python server is running on your system before using it with Inspect.
The following environment variables are supported by the llama-cpp-python provider
Variable | Description |
---|---|
LLAMA_CPP_PYTHON_BASE_URL |
Base URL for requests (optional, defaults to http://localhost:8000/v1 ) |
OpenRouter
The OpenRouter provider described below is currently available only in the development version of Inspect. To install the development version from GitHub:
pip install git+https://github.com/UKGovernmentBEIS/inspect_ai
To use the OpenRouter provider, install the openai
package (which the OpenRouter service provides a compatible backend for), set your credentials, and specify a model using the --model
option:
pip install openai
export OPENROUTER_API_KEY=your-openrouter-api-key
inspect eval arc.py --model openrouter/gryphe/mythomax-l2-13b
For the openrouter
provider, the following custom model args (-M
) are supported (click the argument name to see its docs on the OpenRouter site):
Argument | Example |
---|---|
models |
-M "models=anthropic/claude-3.5-sonnet, gryphe/mythomax-l2-13b" |
provider |
-M "provider={ 'quantizations': ['int8'] }" |
transforms |
-M "transforms=['middle-out']" |
The following environment variables are supported by the OpenRouter AI provider
Variable | Description |
---|---|
OPENROUTER_API_KEY |
API key credentials (required). |
OPENROUTER_BASE_URL |
Base URL for requests (optional, defaults to https://openrouter.ai/api/v1 ) |
Custom Models
If you want to support another model hosting service or local model source, you can add a custom model API. See the documentation on Model API Extensions for additional details.