Datasets
Overview
Inspect has native support for reading datasets in the CSV, JSON, and JSON Lines formats, as well as from Hugging Face. In addition, the core dataset interface for the evaluation pipeline is flexible enough to accept data read from just about any source (see the Custom Reader section below for details).
If your data is already in a format amenable for direct reading as an Inspect Sample
, reading a dataset is as simple as this:
from inspect_ai.dataset import csv_dataset, json_dataset
= csv_dataset("dataset1.csv")
dataset1 = json_dataset("dataset2.json") dataset2
Of course, many real-world datasets won’t be so trivial to read. Below we’ll discuss the various ways you can adapt your datasets for use with Inspect.
Dataset Samples
The core data type underlying the use of datasets with Inspect is the Sample
, which consists of a required input
field and several other optional fields:
Class inspect_ai.dataset.Sample
Field | Type | Description |
---|---|---|
input |
str | list[ChatMessage] |
The input to be submitted to the model. |
choices |
list[str] | None |
Optional. Multiple choice answer list. |
target |
str | list[str] | None |
Optional. Ideal target output. May be a literal value or narrative text to be used by a model grader. |
id |
str | None |
Optional. Unique identifier for sample. |
metadata |
dict[str | Any] | None |
Optional. Arbitrary metadata associated with the sample. |
sandbox |
str | tuple[str,str] |
Optional. Sandbox environment type (or optionally a tuple with type and config file) |
files |
dict[str | str] | None |
Optional. Files that go along with the sample (copied to sandbox environments). |
setup |
str | None |
Optional. Setup script to run for sample (executed within default sandbox environment). |
So a CSV dataset with the following structure:
input | target |
---|---|
What cookie attributes should I use for strong security? | secure samesite and httponly |
How should I store passwords securely for an authentication system database? | strong hashing algorithms with salt like Argon2 or bcrypt |
Can be read directly with:
= csv_dataset("security_guide.csv") dataset
Note that samples from datasets without an id
field will automatically be assigned ids based on an auto-incrementing integer starting with 1.
If your samples include choices
, then the target
should be a numeric index into the available choices
rather than a letter (this is an implicit assumption of the multiple_choice()
solver).
Files
The files
field maps container target file paths to file contents (where contents can be either a filesystem path, a URL, or a string with inline content). For example, to copy a local file named flag.txt
into the container path /shared/flag.txt
you would use this:
"/shared/flag.txt": "flag.txt"
Files are copied into the default sandbox environment unless their name contains a prefix mapping them into another environment. For example, to copy into the victim
container:
"victim:/shared/flag.txt": "flag.txt"
Field Mapping
If your dataset contains inputs and targets that don’t use input
and target
as field names, you can map them into a Dataset
using a FieldSpec
. This same mechanism also enables you to collect arbitrary additional fields into the Sample
metadata
bucket. For example:
from inspect_ai.dataset import FieldSpec, json_dataset
= json_dataset(
dataset "popularity.jsonl",
FieldSpec(input="question",
="answer_matching_behavior",
targetid="question_id",
=["label_confidence"],
metadata
), )
If you need to do more than just map field names and actually do custom processing of the data, you can instead pass a function which takes a record
(represented as a dict
) from the underlying file and returns a Sample
. For example:
from inspect_ai.dataset import Sample, json_dataset
def record_to_sample(record):
return Sample(
input=record["question"],
=record["answer_matching_behavior"].strip(),
targetid=record["question_id"],
={
metadata"label_confidence": record["label_confidence"]
}
)
= json_dataset("popularity.jsonl", record_to_sample) dataset
Filter and Shuffle
The Dataset
class includes filter()
and shuffle()
methods, as well as support for the slice operator.
To select a subset of the dataset, use filter()
:
= json_dataset("popularity.jsonl", record_to_sample)
dataset = dataset.filter(
dataset lambda sample : sample.metadata["category"] == "advanced"
)
To select a subset of records, use standard Python slicing:
= dataset[0:100] dataset
Shuffling is often helpful when you want to vary the samples used during evaluation development. To do this, either use the shuffle()
method or the shuffle
parameter of the dataset loading functions:
# shuffle method
= dataset.shuffle()
dataset
# shuffle on load
= json_dataset("data.jsonl", shuffle=True) dataset
Note that both of these methods optionally support specifying a random seed for shuffling.
Hugging Face
Hugging Face Datasets is a library for easily accessing and sharing datasets for machine learning, and features integration with Hugging Face Hub, a repository with a broad selection of publicly shared datasets. Typically datasets on Hugging Face will require specification of which split within the dataset to use (e.g. train, test, or validation) as well as some field mapping. Use the hf_dataset()
function to read a dataset and specify the requisite split and field names:
from inspect_ai.dataset import FieldSpec, hf_dataset
=hf_dataset("openai_humaneval",
dataset="test",
split=FieldSpec(
sample_fieldsid="task_id",
input="prompt",
="canonical_solution",
target=["test", "entry_point"]
metadata
) )
Note that some HuggingFace datasets execute Python code in order to resolve the underlying dataset files. Since this code is run on your local machine, you need to specify trust = True
in order to perform the download. This option should only be set to True
for repositories you trust and in which you have read the code. Here’s an example of using the trust
option (note that it defaults to False
if not specified):
=hf_dataset("openai_humaneval",
dataset="test",
split=True,
trust
... )
Under the hood, the hf_dataset()
function is calling the load_dataset() function in the Hugging Face datasets package. You can additionally pass arbitrary parameters on to load_dataset()
by including them in the call to hf_dataset()
. For example hf_dataset(..., cache_dir="~/my-cache-dir")
.
Amazon S3
Inspect has integrated support for storing datasets on Amazon S3. Compared to storing data on the local file-system, using S3 can provide more flexible sharing and access control, and a more reliable long term store than local files.
Using S3 is mostly a matter of substituting S3 URLs (e.g. s3://my-bucket-name
) for local file-system paths. For example, here is how you load a dataset from S3:
"s3://my-bucket/dataset.jsonl") json_dataset(
S3 buckets are normally access controlled so require authentication to read from. There are a wide variety of ways to configure your client for AWS authentication, all of which work with Inspect. See the article on Configuring the AWS CLI for additional details.
Chat Messages
The most important data structure within Sample
is the ChatMessage
. Note that often datasets will contain a simple string as their input (which is then internally converted to a ChatMessageUser
). However, it is possible to include a full message history as the input via ChatMessage
. Another useful application of ChatMessage
is providing multi-modal input (e.g. images).
Class inspect_ai.model.ChatMessage
Field | Type | Description |
---|---|---|
role |
"system" | "user" | "assistant" | "tool" |
Role of this chat message. |
content |
str | list[ChatContent] |
The content of the message. Can be a simple string or a list of content parts intermixing text and images. |
An input with chat messages in your dataset might will look something like this:
"input": [
{"role": "user",
"content": "What cookie attributes should I use for strong security?"
} ]
Note that for this example we wouldn’t normally use a full chat message object (rather we’d just provide a simple string). Chat message objects are more useful when you want to include a system prompt or prime the conversation with “assistant” responses.
Image Input
Image input is currently only supported for OpenAI vision models (e.g. gpt-4-vision-preview), Google Gemini vision models (e.g. gemini-pro-vision), and Anthropic Claude 3 models.
To include an image, your dataset input might look like this:
"input": [
{"role": "user",
"content": [
"type": "text", "text": "What is this a picture of?"},
{ "type": "image", "image": "picture.png"}
{
]
} ]
Where "picture.png"
is resolved relative to the directory containing the dataset file. The image can be specified either as a URL (accessible to the model), a local file path, or a base64 encoded Data URL.
If you are constructing chat messages programmatically, then the equivalent to the above would be:
= [
ChatMessageUser(content ="What is this a picture of?"),
ContentText(text="picture.png")
ContentImage(image ])
If you are using paths or URLs to images and want the full base64 encoded content of images included in log files, use the --log-images
CLI flag (or log_images
argument to eval
). Note however that you should generally not do this if you have either large images or a large quantity of images, as this can substantially increase the size of the log file, making it difficult to load into Inspect View with reasonable performance.
Custom Reader
You are not restricted to the built in dataset functions for reading samples. You can also construct a MemoryDataset
, and pass that to a task. For example:
from inspect_ai import Task, task
from inspect_ai.dataset import MemoryDataset, Sample
from inspect_ai.scorer import model_graded_fact
from inspect_ai.solver import generate, system_message
=MemoryDataset([
dataset
Sample(input="What cookie attributes should I use for strong security?",
="secure samesite and httponly",
target
)
])
@task
def security_guide():
return Task(
=dataset,
dataset=[system_message(SYSTEM_MESSAGE), generate()],
solver=model_graded_fact(),
scorer )
So if the built in dataset functions don’t meet your needs, you can create a custom function that yields a MemoryDataset
and pass those directly to your Task
.