Agent Bridge
Overview
While Inspect provides facilities for native agent development, you can also very easily integrate agents created with 3rd party frameworks like AutoGen or LangChain, or use fully custom agents you have developed or ported from a research paper. The basic mechanism for integrating external agents works like this:
Write an agent function that takes a sample
dict
as input and a returns a resultsdict
with output. This function won’t have any dependencies on Inspect, rather it will depend on whatever agent framework or custom code you are using.This function should use the OpenAI API for model access, however calls to the OpenAI API will be redirected to Inspect (using whatever model is configured for the current task).
Use the agent function with Inspect by passing it to the
bridge()
function, which will turn it into a standard InspectSolver
.
Agent Function
An external agent function is simillar to an Inspect Solver
but without Inspect TaskState
. Rather, it takes a sample dict
as input and returns a result dict
as output.
Here is a very simple agent function definition (it just calls generate and returns the ouptut). It is structured similar to an Inspect Solver
where an enclosing function returns the function that handles the sample (this enables you to share initialisation code and pass options to configure the behaviour of the agent):
agent.py
from openai import AsyncOpenAI
def my_agent():
async def run(sample: dict[str, Any]) -> dict[str, Any]:
= AsyncOpenAI()
client = await client.chat.completions.create(
completion ="inspect",
model=sample["input"],
messages
)return {
"output": completion.choices[0].message.content
}
return run
We use the OpenAI API with model="inspect"
, which enables Inspect to intercept the request and send it to the Inspect model being evaluated for the task.
We read the input from sample["input"]
(a list of OpenAI compatible messages) and return output
as a string in the result dict
.
Here is how you can use the bridge()
function to use this agent as a solver:
task.py
from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.scorer import includes
from inspect_ai.solver import bridge
from agents import my_agent
@task
def hello():
return Task(
=[Sample(input="Please print the word 'hello'?", target="hello")],
dataset=bridge(my_agent()),
solver=includes(),
scorer )
- 1
-
Import custom agent from
agent.py
file (shown above) - 2
-
Adapt custom agent into an Inspect solver with the
bridge()
function.
For more in-depth examples that make use of popular agent framworks, see:
We’ll walk through the AutoGen example in more depth below.
Example: AutoGen
Here is an agent written with the AutoGen framework. You’ll notice that it is structured similar to an Inspect Solver
where an enclosing function returns the function which handles the sample (this enables you to share initialisation code and pass options to configure the behaviour of the agent):
agent.py
from typing import Any, cast
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.conditions import SourceMatchTermination
from autogen_agentchat.messages import TextMessage
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_core.models import ModelInfo
from autogen_ext.agents.web_surfer import MultimodalWebSurfer
from autogen_ext.models.openai import OpenAIChatCompletionClient
def web_surfer_agent():
# Use OpenAI interface (redirected to Inspect model)
= OpenAIChatCompletionClient(
model ="inspect",
model=ModelInfo(
model_info=True, function_calling=True,
vision=False, family="unknown"
json_output
),
)
# Sample handler
async def run(sample: dict[str, Any]) -> dict[str, Any]:
# Read input (convert from OpenAI format)
input = [
=msg["role"], content=str(msg["content"]))
TextMessage(sourcefor msg in sample["input"]
]
# Create agents and team
= MultimodalWebSurfer("web_surfer", model)
web_surfer = AssistantAgent("assistant", model)
assistant = SourceMatchTermination("assistant")
termination = RoundRobinGroupChat(
team
[web_surfer, assistant],=termination
termination_condition
)
# Run team
= await team.run(task=input)
result
# Extract output from last message and return
= cast(TextMessage, result.messages[-1])
message return dict(output=message.content)
return run
- 1
-
Use the OpenAI API with
model="inspect"
to interface with the model for the running Inspect task. - 2
-
The
sample
includesinput
(chat messages) and several other fields (id, epoch, metadata, etc.). Theresult
includes modeloutput
as a string. - 3
-
Input is based using OpenAI API compatible messages—here we convert them to native AutoGen
TextMessage
objects. - 4
- Configure and create AutoGen multi-agent team. This can use any combination of agents and any team structure including custom ones.
- 5
-
Extract content from final assistant message and return it as
output
.
To use this agent in an Inspect Task
, import it and use the bridge()
function:
task.py
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_fact
from inspect_ai.solver import bridge
from agent import web_surfer_agent
@task
def research() -> Task:
return Task(
=json_dataset("dataset.json"),
dataset=bridge(web_surfer_agent()),
solver=model_graded_fact(),
scorer )
- 1
-
Import custom agent from
agent.py
file (shown above) - 2
-
Adapt custom agent into an Inspect solver with the
bridge()
function.
The bridge()
function takes the agent function and hooks it up to a standard Inspect Solver
, updating the TaskState
and providing the means of redirecting OpenAI calls to the current Inspect model.
Bridge Types
In the examples above we reference two dict
fields from the agent function interface:
sample["input"] |
list[ChatCompletionMessageParam] |
result["output"] |
str |
For many agents these fields will be all you need. In some circumstances other available fields will be useful. Here are the full type declarations for the sample
and result
:
from typing import NotRequired, TypedDict
from openai.types.chat import ChatCompletionMessageParam
class SampleDict(TypedDict):
str
sample_id: int
epoch: input: list[ChatCompletionMessageParam]
dict[str, Any]
metadata: str | list[str]
target:
class ResultDict(TypedDict):
str
output: list[ChatCompletionMessageParam]]
messages: NotRequired[dict[str, ScoreDict]] scores: NotRequired[
You aren’t required to use these types exactly (they merely document the interface) so long as you consume and produce dict
values that match their declarations (the result dict
is type validated at runtime).
Returning messages
is not required as messages are automatically synced to the task state during generate (return messages
only if you want to customise the default behaviour).
Scores
Returning scores
is also optional as most agents will rely on native Inspect scoring (returning scores is an escape hatch for agents that want to do their own scoring). If you do return scores use this format (which is based on Inspect Score
objects):
class ScoreDict(TypedDict):
value: (str
| int
| float
| bool
| list[str | int | float | bool]
| dict[str, str | int | float | bool | None]
)str]
answer: NotRequired[str]
explanation: NotRequired[dict[str, Any]] metadata: NotRequired[
CLI Usage
Above we import the web_surfer_agent()
directly as a Python function. It’s also possible to reference external agents at the command line using the --solver
parameter. For example:
inspect eval task.py --solver agent.py
This also works with --solver
arguments passed via -S
. For example:
inspect eval task.py --solver agent.py -S max_requests=5
The agent.py
source file will be searched for public top level functions that include agent
in their name. If you want to explicitly reference an agent function you can do this as follows:
inspect eval task.py --solver agent.py@web_surfer_agent
Models
As demonstrated above, communication with Inspect models is done by using the OpenAI API with model="inspect"
. You can use the same technique to interface with other Inspect models. To do this, preface the model name with “inspect” followed by the rest of the fully qualified model name.
For example, in a LangChain agent, you would do this to utilise the Inspect interface to Gemini:
= ChatOpenAI(model="inspect/google/gemini-1.5-pro") model
Sandboxes
If you need to execute untrusted LLM generated code in your agent, you can still use the Inspect sandbox()
within bridged agent functions. Typically agent tools that can run code are customisable with an executor, and this is where you would plug in the Inspect sandbox()
.
For example, the AutoGen PythonCodeExecutionTool
takes a CodeExecutor
in its constructor. AutoGen provides several built in code executors (e.g. local, docker, azure, etc.) and you can create custom ones. For example, you could create an InspectSandboxCodeExecutor
which in turn delegates to the sandbox().exec()
function.
Transcript
Custom agents run through the bridge()
function still get most of the benefit of the Inspect transcript and log viewer. All model calls are captured and produce the same transcript output as when using conventional solvers. The message history is also automatically captured and logged.
Calls to the Python logging
module for levels info
and above are also handled as normal and show up within sample transcripts.
If you want to use additional features of Inspect transcripts (e.g. steps, markdown output, etc.) you can still import and use the transcript
function as normal. For example:
from inspect_ai.log import transcript
"custom *markdown* content") transcript().info(
This code will no-op when running outside of Inspect to it is safe to include in agents that are also run in other environments.