inspect_ai.model
Generation
get_model
Get an instance of a model.
Calls to get_model() are memoized (i.e. a call with the same arguments will return an existing instance of the model rather than creating a new one). You can disable this with memoize=False
.
If you prefer to immediately close models after use (as well as prevent caching) you can employ the async context manager built in to the Model class. For example:
async with get_model("openai/gpt-4o") as model:
= await model.generate("Say hello") response
In this case, the model client will be closed at the end of the context manager and will not be available in the get_model() cache.
def get_model(
str | Model | None = None,
model: = GenerateConfig(),
config: GenerateConfig str | None = None,
base_url: str | None = None,
api_key: bool = True,
memoize: **model_args: Any,
-> Model )
model
str | Model | None-
Model specification. If Model is passed it is returned unmodified, if
None
is passed then the model currently being evaluated is returned (or if there is no evaluation then the model referred to byINSPECT_EVAL_MODEL
). config
GenerateConfig-
Configuration for model.
base_url
str | None-
Optional. Alternate base URL for model.
api_key
str | None-
Optional. API key for model.
memoize
bool-
Use/store a cached version of the model based on the parameters to get_model()
**model_args
Any-
Additional args to pass to model constructor.
Model
Model interface.
Use get_model() to get an instance of a model. Model provides an async context manager for closing the connection to it after use. For example:
async with get_model("openai/gpt-4o") as model:
= await model.generate("Say hello") response
class Model
Attributes
api
ModelAPI-
Model API.
config
GenerateConfig-
Generation config.
name
str-
Model name.
Methods
- __init__
-
Create a model.
def __init__(self, api: ModelAPI, config: GenerateConfig) -> None
api
ModelAPI-
Model API provider.
config
GenerateConfig-
Model configuration.
- generate
-
Generate output from the model.
async def generate( self, input: str | list[ChatMessage], list[Tool] tools: | list[ToolDef] | list[ToolInfo] | list[Tool | ToolDef | ToolInfo] = [], | None = None, tool_choice: ToolChoice = GenerateConfig(), config: GenerateConfig bool | CachePolicy = False, cache: -> ModelOutput )
input
str | list[ChatMessage]-
Chat message input (if a
str
is passed it is converted to a ChatMessageUser). tools
list[Tool] | list[ToolDef] | list[ToolInfo] | list[Tool | ToolDef | ToolInfo]-
Tools available for the model to call.
tool_choice
ToolChoice | None-
Directives to the model as to which tools to prefer.
config
GenerateConfig-
Model configuration.
cache
bool | CachePolicy-
Caching behavior for generate responses (defaults to no caching).
GenerateConfig
Model generation options.
class GenerateConfig(BaseModel)
Attributes
max_retries
int | None-
Maximum number of times to retry request (defaults to 5).
timeout
int | None-
Request timeout (in seconds).
max_connections
int | None-
Maximum number of concurrent connections to Model API (default is model specific).
system_message
str | None-
Override the default system message.
max_tokens
int | None-
The maximum number of tokens that can be generated in the completion (default is model specific).
top_p
float | None-
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
temperature
float | None-
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
stop_seqs
list[str] | None-
Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
best_of
int | None-
Generates best_of completions server-side and returns the ‘best’ (the one with the highest log probability per token). vLLM only.
frequency_penalty
float | None-
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. OpenAI, Google, Grok, Groq, and vLLM only.
presence_penalty
float | None-
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics. OpenAI, Google, Grok, Groq, and vLLM only.
logit_bias
dict[int, float] | None-
Map token Ids to an associated bias value from -100 to 100 (e.g. “42=10,43=-10”). OpenAI, Grok, and Grok only.
seed
int | None-
Random seed. OpenAI, Google, Mistral, Groq, HuggingFace, and vLLM only.
top_k
int | None-
Randomly sample the next word from the top_k most likely next words. Anthropic, Google, HuggingFace, and vLLM only.
num_choices
int | None-
How many chat completion choices to generate for each input message. OpenAI, Grok, Google, TogetherAI, and vLLM only.
logprobs
bool | None-
Return log probabilities of the output tokens. OpenAI, Grok, TogetherAI, Huggingface, llama-cpp-python, and vLLM only.
top_logprobs
int | None-
Number of most likely tokens (0-20) to return at each token position, each with an associated log probability. OpenAI, Grok, Huggingface, and vLLM only.
parallel_tool_calls
bool | None-
Whether to enable parallel function calling during tool use (defaults to True). OpenAI and Groq only.
internal_tools
bool | None-
Whether to automatically map tools to model internal implementations (e.g. ‘computer’ for anthropic).
max_tool_output
int | None-
Maximum tool output (in bytes). Defaults to 16 * 1024.
cache_prompt
Literal['auto'] | bool | None-
Whether to cache the prompt prefix. Defaults to “auto”, which will enable caching for requests with tools. Anthropic only.
reasoning_effort
Literal['low', 'medium', 'high'] | None-
Constrains effort on reasoning for reasoning models. Open AI o1 models only.
reasoning_history
bool | None-
Include reasoning in chat message history sent to generate.
Methods
- merge
-
Merge another model configuration into this one.
def merge( self, other: Union["GenerateConfig", GenerateConfigArgs] -> "GenerateConfig" )
other
Union[GenerateConfig, GenerateConfigArgs]-
Configuration to merge.
GenerateConfigArgs
Type for kwargs that selectively override GenerateConfig.
class GenerateConfigArgs(TypedDict, total=False)
Attributes
max_retries
int | None-
Maximum number of times to retry request (defaults to 5).
timeout
int | None-
Request timeout (in seconds).
max_connections
int | None-
Maximum number of concurrent connections to Model API (default is model specific).
system_message
str | None-
Override the default system message.
max_tokens
int | None-
The maximum number of tokens that can be generated in the completion (default is model specific).
top_p
float | None-
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
temperature
float | None-
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
stop_seqs
list[str] | None-
Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
best_of
int | None-
Generates best_of completions server-side and returns the ‘best’ (the one with the highest log probability per token). vLLM only.
frequency_penalty
float | None-
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. OpenAI, Google, Grok, Groq, and vLLM only.
presence_penalty
float | None-
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics. OpenAI, Google, Grok, Groq, and vLLM only.
logit_bias
dict[int, float] | None-
Map token Ids to an associated bias value from -100 to 100 (e.g. “42=10,43=-10”). OpenAI and Grok only.
seed
int | None-
Random seed. OpenAI, Google, Mistral, Groq, HuggingFace, and vLLM only.
top_k
int | None-
Randomly sample the next word from the top_k most likely next words. Anthropic, Google, and HuggingFace only.
num_choices
int | None-
How many chat completion choices to generate for each input message. OpenAI, Grok, Google, and TogetherAI only.
logprobs
bool | None-
Return log probabilities of the output tokens. OpenAI, Grok, TogetherAI, Huggingface, llama-cpp-python, and vLLM only.
top_logprobs
int | None-
Number of most likely tokens (0-20) to return at each token position, each with an associated log probability. OpenAI, Grok, and Huggingface only.
parallel_tool_calls
bool | None-
Whether to enable parallel function calling during tool use (defaults to True). OpenAI and Groq only.
internal_tools
bool | None-
Whether to automatically map tools to model internal implementations (e.g. ‘computer’ for anthropic).
max_tool_output
int | None-
Maximum tool output (in bytes). Defaults to 16 * 1024.
cache_prompt
Literal['auto'] | bool | None-
Whether to cache the prompt prefix. Defaults to “auto”, which will enable caching for requests with tools. Anthropic only.
reasoning_effort
Literal['low', 'medium', 'high'] | None-
Constrains effort on reasoning for reasoning models. Open AI o1 models only.
reasoning_history
bool | None-
Include reasoning in chat message history sent to generate.
ModelOutput
Output from model generation.
class ModelOutput(BaseModel)
Attributes
model
str-
Model used for generation.
choices
list[ChatCompletionChoice]-
Completion choices.
usage
ModelUsage | None-
Model token usage
time
float | None-
Time elapsed (in seconds) for call to generate.
metadata
dict[str, Any] | None-
Additional metadata associated with model output.
error
str | None-
Error message in the case of content moderation refusals.
stop_reason
StopReason-
First message stop reason.
message
ChatMessageAssistant-
First message choice.
completion
str-
Text of first message choice text.
Methods
- from_content
-
Create ModelOutput from simple text content.
@staticmethod def from_content( str, model: str, content: = "stop", stop_reason: StopReason str | None = None, error: -> "ModelOutput" )
model
str-
Model name.
content
str-
Text content from generation.
stop_reason
StopReason-
Stop reason for generation.
error
str | None-
Error message.
- for_tool_call
-
Returns a ModelOutput for requesting a tool call.
@staticmethod def for_tool_call( str, model: str, tool_name: dict[str, Any], tool_arguments: str | None = None, tool_call_id: str | None = None, content: -> "ModelOutput" )
model
str-
model name
tool_name
str-
The name of the tool.
tool_arguments
dict[str, Any]-
The arguments passed to the tool.
tool_call_id
str | None-
Optional ID for the tool call. Defaults to a random UUID.
content
str | None-
Optional content to include in the message. Defaults to “tool call for tool {tool_name}”.
ModelUsage
Token usage for completion.
class ModelUsage(BaseModel)
Attributes
input_tokens
int-
Total input tokens used.
output_tokens
int-
Total output tokens used.
total_tokens
int-
Total tokens used.
input_tokens_cache_write
int | None-
Number of tokens written to the cache.
input_tokens_cache_read
int | None-
Number of tokens retrieved from the cache.
StopReason
Reason that the model stopped or failed to generate.
= Literal[
StopReason "stop",
"max_tokens",
"model_length",
"tool_calls",
"content_filter",
"unknown",
]
ChatCompletionChoice
Choice generated for completion.
class ChatCompletionChoice(BaseModel)
Attributes
message
ChatMessageAssistant-
Assistant message.
stop_reason
StopReason-
Reason that the model stopped generating.
logprobs
Logprobs | None-
Logprobs.
Messages
ChatMessage
Message in a chat conversation
= Union[
ChatMessage
ChatMessageSystem, ChatMessageUser, ChatMessageAssistant, ChatMessageTool ]
ChatMessageBase
Base class for chat messages.
class ChatMessageBase(BaseModel)
Attributes
role
Literal['system', 'user', 'assistant', 'tool']-
Conversation role
content
str | list[Content]-
Content (simple string or list of content objects)
source
Literal['input', 'generate'] | None-
Source of message.
text
str-
Get the text content of this message.
ChatMessage content is very general and can contain either a simple text value or a list of content parts (each of which can either be text or an image). Solvers (e.g. for prompt engineering) often need to interact with chat messages with the assumption that they are a simple string. The text property returns either the plain str content, or if the content is a list of text and images, the text items concatenated together (separated by newline)
ChatMessageSystem
System chat message.
class ChatMessageSystem(ChatMessageBase)
Attributes
role
Literal['system']-
Conversation role.
ChatMessageUser
User chat message.
class ChatMessageUser(ChatMessageBase)
Attributes
role
Literal['user']-
Conversation role.
tool_call_id
list[str] | None-
ID(s) of tool call(s) this message has the content payload for.
ChatMessageAssistant
Assistant chat message.
class ChatMessageAssistant(ChatMessageBase)
Attributes
role
Literal['assistant']-
Conversation role.
tool_calls
list[ToolCall] | None-
Tool calls made by the model.
reasoning
str | None-
Reasoning content.
ChatMessageTool
Tool chat message.
class ChatMessageTool(ChatMessageBase)
Attributes
role
Literal['tool']-
Conversation role.
tool_call_id
str | None-
ID of tool call.
function
str | None-
Name of function called.
error
ToolCallError | None-
Error which occurred during tool call.
Content
Content
Content sent to or received from a model.
= Union[ContentText, ContentImage, ContentAudio, ContentVideo] Content
ContentText
Text content.
class ContentText(BaseModel)
Attributes
type
Literal['text']-
Type.
text
str-
Text content.
ContentImage
Image content.
class ContentImage(BaseModel)
Attributes
type
Literal['image']-
Type.
image
str-
Either a URL of the image or the base64 encoded image data.
detail
Literal['auto', 'low', 'high']-
Specifies the detail level of the image.
Currently only supported for OpenAI. Learn more in the Vision guide.
ContentAudio
Audio content.
class ContentAudio(BaseModel)
Attributes
type
Literal['audio']-
Type.
audio
str-
Audio file path or base64 encoded data URL.
format
Literal['wav', 'mp3']-
Format of audio data (‘mp3’ or ‘wav’)
ContentVideo
Video content.
class ContentVideo(BaseModel)
Attributes
type
Literal['video']-
Type.
video
str-
Audio file path or base64 encoded data URL.
format
Literal['mp4', 'mpeg', 'mov']-
Format of video data (‘mp4’, ‘mpeg’, or ‘mov’)
Logprobs
Logprob
Log probability for a token.
class Logprob(BaseModel)
Attributes
token
str-
The predicted token represented as a string.
logprob
float-
The log probability value of the model for the predicted token.
bytes
list[int] | None-
The predicted token represented as a byte array (a list of integers).
top_logprobs
list[TopLogprob] | None-
If the
top_logprobs
argument is greater than 0, this will contain an ordered list of the top K most likely tokens and their log probabilities.
Logprobs
Log probability information for a completion choice.
class Logprobs(BaseModel)
Attributes
content
list[Logprob]-
a (num_generated_tokens,) length list containing the individual log probabilities for each generated token.
TopLogprob
List of the most likely tokens and their log probability, at this token position.
class TopLogprob(BaseModel)
Attributes
token
str-
The top-kth token represented as a string.
logprob
float-
The log probability value of the model for the top-kth token.
bytes
list[int] | None-
The top-kth token represented as a byte array (a list of integers).
Caching
CachePolicy
The CachePolicy is used to define various criteria that impact how model calls are cached.
expiry
: Default “24h”. The expiry time for the cache entry. This is a string of the format “12h” for 12 hours or “1W” for a week, etc. This is how long we will keep the cache entry, if we access it after this point we’ll clear it. Setting to None
will cache indefinitely.
per_epoch
: Default True. By default we cache responses separately for different epochs. The general use case is that if there are multiple epochs, we should cache each response separately because scorers will aggregate across epochs. However, sometimes a response can be cached regardless of epoch if the call being made isn’t under test as part of the evaluation. If False, this option allows you to bypass that and cache independently of the epoch.
scopes
: A dictionary of additional metadata that should be included in the cache key. This allows for more fine-grained control over the cache key generation.
class CachePolicy
Methods
- __init__
-
Create a CachePolicy.
def __init__( self, str | None = "1W", expiry: bool = True, per_epoch: dict[str, str] = {}, scopes: -> None )
expiry
str | None-
Expiry.
per_epoch
bool-
Per epoch
scopes
dict[str, str]-
Scopes
cache_size
Calculate the size of various cached directories and files
If neither subdirs
nor files
are provided, the entire cache directory will be calculated.
def cache_size(
list[str] = [], files: list[Path] = []
subdirs: -> list[tuple[str, int]] )
subdirs
list[str]-
List of folders to filter by, which are generally model names. Empty directories will be ignored.
files
list[Path]-
List of files to filter by explicitly. Note that return value group these up by their parent directory
cache_clear
Clear the cache directory.
def cache_clear(model: str = "") -> bool
model
str-
Model to clear cache for.
cache_list_expired
Returns a list of all the cached files that have passed their expiry time.
def cache_list_expired(filter_by: list[str] = []) -> list[Path]
filter_by
list[str]-
Default []. List of model names to filter by. If an empty list, this will search the entire cache.
cache_prune
Delete all expired cache entries.
def cache_prune(files: list[Path] = []) -> None
files
list[Path]-
List of files to prune. If empty, this will search the entire cache.
cache_path
Path to cache directory.
def cache_path(model: str = "") -> Path
model
str-
Path to cache directory for specific model.
Provider
modelapi
Decorator for registering model APIs.
def modelapi(name: str) -> Callable[..., type[ModelAPI]]
name
str-
Name of API
ModelAPI
Model API provider.
If you are implementing a custom ModelAPI provider your __init__()
method will also receive a **model_args
parameter that will carry any custom model_args
(or -M
arguments from the CLI) specified by the user. You can then pass these on to the approriate place in your model initialisation code (for example, here is what many of the built-in providers do with the model_args
passed to them: https://inspect.ai-safety-institute.org.uk/models.html#model-args)
class ModelAPI(abc.ABC)
Methods
- __init__
-
Create a model API provider.
def __init__( self, str, model_name: str | None = None, base_url: str | None = None, api_key: list[str] = [], api_key_vars: = GenerateConfig(), config: GenerateConfig -> None )
model_name
str-
Model name.
base_url
str | None-
Alternate base URL for model.
api_key
str | None-
API key for model.
api_key_vars
list[str]-
Environment variables that may contain keys for this provider (used for override)
config
GenerateConfig-
Model configuration.
- close
-
Close method for closing any client allocated for the model.
async def close(self) -> None
- generate
-
Generate output from the model.
@abc.abstractmethod async def generate( self, input: list[ChatMessage], list[ToolInfo], tools: tool_choice: ToolChoice, config: GenerateConfig,-> ModelOutput | tuple[ModelOutput | Exception, ModelCall] )
input
list[ChatMessage]-
Chat message input (if a
str
is passed it is converted to aChatUserMessage
). tools
list[ToolInfo]-
Tools available for the model to call.
tool_choice
ToolChoice-
Directives to the model as to which tools to prefer.
config
GenerateConfig-
Model configuration.
- max_tokens
-
Default max_tokens.
def max_tokens(self) -> int | None
- max_connections
-
Default max_connections.
def max_connections(self) -> int
- connection_key
-
Scope for enforcement of max_connections.
def connection_key(self) -> str
- is_rate_limit
-
Is this exception a rate limit error.
def is_rate_limit(self, ex: BaseException) -> bool
ex
BaseException-
Exception to check for rate limit.
- collapse_user_messages
-
Collapse consecutive user messages into a single message.
def collapse_user_messages(self) -> bool
- collapse_assistant_messages
-
Collapse consecutive assistant messages into a single message.
def collapse_assistant_messages(self) -> bool
- tools_required
-
Any tool use in a message stream means that tools must be passed.
def tools_required(self) -> bool
- tool_result_images
-
Tool results can contain images
def tool_result_images(self) -> bool
- has_reasoning_history
-
Chat message assistant messages can include reasoning.
def has_reasoning_history(self) -> bool