Agent Bridge

Overview

While Inspect provides facilities for native agent development, you can also very easily integrate agents created with 3rd party frameworks like AutoGen or LangChain, or use fully custom agents you have developed or ported from a research paper. The basic mechanism for integrating external agents works like this:

Write an agent function that takes a sample dict as input and a returns a results dict with output. This function won’t have any dependencies on Inspect, rather it will depend on whatever agent framework or custom code you are using.
This function should use the OpenAI API for model access, however calls to the OpenAI API will be redirected to Inspect (using whatever model is configured for the current task).
Use the agent function with Inspect by passing it to the bridge() function, which will turn it into a standard Inspect Solver.

Agent Function

An external agent function is simillar to an Inspect Solver but without Inspect TaskState. Rather, it takes a sample dict as input and returns a result dict as output.

Here is a very simple agent function definition (it just calls generate and returns the ouptut). It is structured similar to an Inspect Solver where an enclosing function returns the function that handles the sample (this enables you to share initialisation code and pass options to configure the behaviour of the agent):

agent.py

from openai import AsyncOpenAI

def my_agent():

    async def run(sample: dict[str, Any]) -> dict[str, Any]:
        client = AsyncOpenAI()
        completion = await client.chat.completions.create(
            model="inspect",
            messages=sample["input"],
        )
        return {
            "output": completion.choices[0].message.content
        }

    return run

We use the OpenAI API with model="inspect", which enables Inspect to intercept the request and send it to the Inspect model being evaluated for the task.

We read the input from sample["input"] (a list of OpenAI compatible messages) and return output as a string in the result dict.

Here is how you can use the bridge() function to use this agent as a solver:

task.py

from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.scorer import includes
from inspect_ai.solver import bridge

from agents import my_agent

@task
def hello():
    return Task(
        dataset=[Sample(input="Please print the word 'hello'?", target="hello")],
        solver=bridge(my_agent()),
        scorer=includes(),
    )

1: Import custom agent from agent.py file (shown above)
2: Adapt custom agent into an Inspect solver with the bridge() function.

For more in-depth examples that make use of popular agent framworks, see:

We’ll walk through the AutoGen example in more depth below.

Example: AutoGen

Here is an agent written with the AutoGen framework. You’ll notice that it is structured similar to an Inspect Solver where an enclosing function returns the function which handles the sample (this enables you to share initialisation code and pass options to configure the behaviour of the agent):

agent.py

from typing import Any, cast

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.conditions import SourceMatchTermination
from autogen_agentchat.messages import TextMessage
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_core.models import ModelInfo
from autogen_ext.agents.web_surfer import MultimodalWebSurfer
from autogen_ext.models.openai import OpenAIChatCompletionClient

def web_surfer_agent():
   
    # Use OpenAI interface (redirected to Inspect model)
    model = OpenAIChatCompletionClient(
        model="inspect",
        model_info=ModelInfo(
            vision=True, function_calling=True,
            json_output=False, family="unknown"
        ),
    )

    # Sample handler
    async def run(sample: dict[str, Any]) -> dict[str, Any]:
        # Read input (convert from OpenAI format)                         
        input = [
            TextMessage(source=msg["role"], content=str(msg["content"]))
            for msg in sample["input"]
        ]

        # Create agents and team
        web_surfer = MultimodalWebSurfer("web_surfer", model)
        assistant = AssistantAgent("assistant", model)
        termination = SourceMatchTermination("assistant")
        team = RoundRobinGroupChat(
            [web_surfer, assistant],
            termination_condition=termination
        )

        # Run team
        result = await team.run(task=input)

        # Extract output from last message and return
        message = cast(TextMessage, result.messages[-1])
        return dict(output=message.content)

    return run

1: Use the OpenAI API with model="inspect" to interface with the model for the running Inspect task.
2: The sample includes input (chat messages) and several other fields (id, epoch, metadata, etc.). The result includes model output as a string.
3: Input is based using OpenAI API compatible messages—here we convert them to native AutoGen TextMessage objects.
4: Configure and create AutoGen multi-agent team. This can use any combination of agents and any team structure including custom ones.
5: Extract content from final assistant message and return it as output.

To use this agent in an Inspect Task, import it and use the bridge() function:

task.py

from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_fact
from inspect_ai.solver import bridge

from agent import web_surfer_agent

@task
def research() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=bridge(web_surfer_agent()),
        scorer=model_graded_fact(),
    )

1: Import custom agent from agent.py file (shown above)
2: Adapt custom agent into an Inspect solver with the bridge() function.

The bridge() function takes the agent function and hooks it up to a standard Inspect Solver, updating the TaskState and providing the means of redirecting OpenAI calls to the current Inspect model.

Bridge Types

In the examples above we reference two dict fields from the agent function interface:

`sample["input"]`	`list[ChatCompletionMessageParam]`
`result["output"]`	`str`

For many agents these fields will be all you need. In some circumstances other available fields will be useful. Here are the full type declarations for the sample and result:

from typing import NotRequired, TypedDict

from openai.types.chat import ChatCompletionMessageParam

class SampleDict(TypedDict):
    sample_id: str
    epoch: int
    input: list[ChatCompletionMessageParam]
    metadata: dict[str, Any]
    target: str | list[str]

class ResultDict(TypedDict):
    output: str
    messages: NotRequired[list[ChatCompletionMessageParam]]
    scores: NotRequired[dict[str, ScoreDict]]

You aren’t required to use these types exactly (they merely document the interface) so long as you consume and produce dict values that match their declarations (the result dict is type validated at runtime).

Returning messages is not required as messages are automatically synced to the task state during generate (return messages only if you want to customise the default behaviour).

Scores

Returning scores is also optional as most agents will rely on native Inspect scoring (returning scores is an escape hatch for agents that want to do their own scoring). If you do return scores use this format (which is based on Inspect Score objects):

class ScoreDict(TypedDict):
    value: (
        str
        | int
        | float
        | bool
        | list[str | int | float | bool]
        | dict[str, str | int | float | bool | None]
    )
    answer: NotRequired[str]
    explanation: NotRequired[str]
    metadata: NotRequired[dict[str, Any]]

CLI Usage

Above we import the web_surfer_agent() directly as a Python function. It’s also possible to reference external agents at the command line using the --solver parameter. For example:

inspect eval task.py --solver agent.py

This also works with --solver arguments passed via -S. For example:

inspect eval task.py --solver agent.py -S max_requests=5

The agent.py source file will be searched for public top level functions that include agent in their name. If you want to explicitly reference an agent function you can do this as follows:

inspect eval task.py --solver agent.py@web_surfer_agent

Models

As demonstrated above, communication with Inspect models is done by using the OpenAI API with model="inspect". You can use the same technique to interface with other Inspect models. To do this, preface the model name with “inspect” followed by the rest of the fully qualified model name.

For example, in a LangChain agent, you would do this to utilise the Inspect interface to Gemini:

model = ChatOpenAI(model="inspect/google/gemini-1.5-pro")

Sandboxes

If you need to execute untrusted LLM generated code in your agent, you can still use the Inspect sandbox() within bridged agent functions. Typically agent tools that can run code are customisable with an executor, and this is where you would plug in the Inspect sandbox().

For example, the AutoGen PythonCodeExecutionTool takes a CodeExecutor in its constructor. AutoGen provides several built in code executors (e.g. local, docker, azure, etc.) and you can create custom ones. For example, you could create an InspectSandboxCodeExecutor which in turn delegates to the sandbox().exec() function.

Transcript

Custom agents run through the bridge() function still get most of the benefit of the Inspect transcript and log viewer. All model calls are captured and produce the same transcript output as when using conventional solvers. The message history is also automatically captured and logged.

Calls to the Python logging module for levels info and above are also handled as normal and show up within sample transcripts.

If you want to use additional features of Inspect transcripts (e.g. steps, markdown output, etc.) you can still import and use the transcript function as normal. For example:

from inspect_ai.log import transcript

transcript().info("custom *markdown* content")

This code will no-op when running outside of Inspect to it is safe to include in agents that are also run in other environments.