Interactivity

Overview

In some cases you may wish to introduce user interaction into the implementation of tasks. For example, you may wish to:

  • Confirm consequential actions like requests made to web services
  • Prompt the model dynamically based on the trajectory of the evaluation
  • Score model output with human judges

The input_screen() function provides a context manager that temporarily clears the task display for user input. Note that prompting the user is a synchronous operation that pauses other activity within the evaluation (pending model requests or subprocesses will continue to execute, but their results won’t be processed until the input is complete).

Example

Before diving into the details of how to add interactions to your tasks, you might want to check out the Intervention Mode example.

Intervention mode is a prototype of an Inspect agent with human intervention, meant to serve as a starting point for evaluations which need these features (e.g. manual open-ended probing). It implements the following:

  1. Sets up a Linux agent with bash() and python() tools.

  2. Prompts the user for a starting question for the agent.

  3. Displays all messages and prompts to approve tool calls.

  4. When the model stops calling tools, prompts the user for the next action (i.e. continue generating, ask a new question, or exit the task).

After reviewing the example and the documentation below you’ll be well equipped to write your own custom interactive evaluation tasks.

Input Screen

You can prompt the user for input at any point in an evaluation using the input_screen() context manager, which clears the normal task display and provides access to a Console object for presenting content and asking for user input. For example:

from inspect_ai.util import input_screen

with input_screen() as console:
    console.print("Some preamble text")
    input = console.input("Please enter your name: ")

The console object provided by the context manager is from the Rich Python library used by Inspect, and has many other capabilities beyond simple text input. Read on to learn more.

Prompts

Rich includes Prompt and Confirm classes with additional capabilities including default values, choice lists, and re-prompting. For example:

from inspect_ai.util import input_screen
from rich.prompt import Prompt

with input_screen() as console:
    name = Prompt.ask(
        "Enter your name", 
        choices=["Paul", "Jessica", "Duncan"], 
        default="Paul"
    )

The Prompt class is designed to be subclassed for more specialized inputs. The IntPrompt and FloatPrompt classes are built-in, but you can also create your own more customised prompts (the Confirm class is another example of this). See the prompt.py source code for additional details.

Trace Mode

When introducing interactions it’s often useful to see a trace of message activity for additional context. You can do this via the --trace CLI option (or trace parameter of the eval() function). For example:

$ inspect eval theory.py --trace

In trace mode, all messages exchanged with the model are printed to the terminal (tool output is truncated at 100 lines).

Note that enabling trace mode automatically sets max_tasks and max_samples to 1, as otherwise messages from concurrently running samples would be interleaved together in an incoherent jumble.

If you want to add your own trace content, use the trace_enabled() function to check whether trace mode is currently enabled and the trace_panel() function to output a panel that is visually consistent with other trace mode output. For example:

from inspect_ai.util import trace_enabled, trace_panel

if trace_enabled():
    trace_panel("My Panel", content="Panel content")

Progress

Evaluations with user input alternate between asking for input and displaying task progress. By default, the normal task status display is shown when a user input screen is not active.

However, if your evaluation is dominated by user input with very short model interactions in between, the task display flashing on and off might prove distracting. For these cases, you can specify the transient=False option, to indicate that the input screen should be shown at all times. For example:

with input_screen(transient=False) as console:
    console.print("Some preamble text")
    input = console.input("Please enter your name: ")

This will result in the input screen staying active throughout the evaluation. A small progress indicator will be shown whenever user input isn’t being requested so that the user knows that the evaluation is still running.

Formatting

The console.print() method supports formatting using simple markup. For example:

with input_screen() as console:
    console.print("[bold red]alert![/bold red] Something happened")

See the documentation on console markup for additional details.

You can also render markdown directly, for example:

from inspect_ai.util import input_screen
from rich.markdown import Markdown

with input_screen() as console:
    console.print(Markdown('The _quick_ brown **fox**'))

Layout

Rich includes Columns, Table and Panel classes for more advanced layout. For example, here is a simple table:

from inspect_ai.util import input_screen
from rich.table import Table

with input_screen() as console:
    table = Table(title="Tool Calls")
    table.add_column("Function", justify="left", style="cyan")
    table.add_column("Parameters", style="magenta")
    table.add_row("bash", "ls /usr/bin")
    table.add_row("python", "print('foo')")
    console.print(table)