Tracing

Overview

Inspect includes a runtime tracing tool that can be used to diagnose issues that aren’t readily observable in eval logs and error messages. Trace logs are written in JSON Lines format and by default include log records from level TRACE and up (including HTTP and INFO).

Trace logs also do explicit enter and exit logging around actions that may encounter errors or fail to complete. For example:

Model API generate() calls
Call to subprocess() (e.g. tool calls that run commands in sandboxes)
Control commands sent to Docker Compose.
Writes to log files in remote storage (e.g. S3).
Model tool calls
Subtasks spawned by solvers.

Action logging enables you to observe execution times, errors, and commands that hang and cause evaluation tasks to not terminate. The inspect trace anomalies command enables you to easily scan trace logs for these conditions.

Usage

Trace logging does not need to be explicitly enabled—logs for the last 10 top level evaluations (i.e. CLI commands or scripts that calls eval functions) are preserved and written to a data directory dedicated to trace logs. You can list the last 10 trace logs with the inspect trace list command:

inspect trace list # --json for JSON output

Trace logs are written using JSON Lines format and are gzip compressed, so reading them requires some special handing. The inspect trace dump command encapsulates this and gives you a normal JSON array with the contents of the trace log (note that trace log filenames include the ID of the process that created them):

inspect trace dump trace-86396.log.gz

You can also apply a filter to the trace file using the --filter argument (which will match log message text case insensitively). For example:

inspect trace dump trace-86396.log.gz --filter model

Anomalies

If an evaluation is running and is not terminating, you can execute the following command to list instances of actions (e.g. model API generates, docker compose commands, tool calls, etc.) that are still running:

inspect trace anomalies

You will first see currently running actions (useful mostly for a “live” evaluation). If you have already cancelled an evaluation you’ll see a list of cancelled actions (with the most recently completed cancelled action on top) which will often also tell you which cancelled action was keeping an evaluation from completing.

Passing no arguments shows the most recent trace log, pass a log file name to view another log:

inspect trace anomalies trace-86396.log.gz

Errors and Timeouts

By default, the inspect trace anomalies command prints only currently running or cancelled actions (as these are what is required to diagnose an evaluation that doesn’t complete). You can optionally also display actions that ended with errors or timeouts by passing the --all flag:

inspect trace anomalies --all

Note that errors and timeouts are not by themselves evidence of problems, since both occur in the normal course of running evaluations (e.g. model generate calls can return errors that are retried and Docker or S3 can also return retryable errors or timeout when they are under heavy load).

As with the inspect trace dump command, you can apply a filter when listing anomolies. For example:

inspect trace anomalies --filter model

Tracing API

In addition to the standard set of actions which are trace logged, you can do your own custom trace logging using the trace_action() and trace_message() APIs. Trace logging is a great way to make sure that logging context is always captured (since the last 10 trace logs are always available) without cluttering up the console or eval transcripts.

trace_action()

Use the trace_action() context manager to collect data on the resolution (e.g. succeeded, cancelled, failed, timed out, etc.) and duration of actions. For example, let’s say you are interacting with a remote content database:

from inspect_ai.util import trace_action

from logging import getLogger
logger = getLogger(__name__)

server = "https://contentdb.example.com"
query = "<content-db-query>"

with trace_action(logger, "ContentDB", f"{server}: {query}"):
    # perform content database query

Your custom trace actions will be reported alongside the standard traced actions in inspect trace anomalies, inspect trace dump, etc.

trace_message()

Use the trace_message() function to trace events that don’t fall into enter/exit pattern supported by trace_action(). For example, let’s say you want to track every invocation of a custom tool:

from inspect_ai.util import trace_message

from logging import getLogger
logger = getLogger(__name__)

trace_message(logger, "MyTool", "message related to tool")