Experiment datasets - Fetch Hive

Use Datasets to store reusable test cases for experiments. Each dataset can have one or more immutable versions. When you run an experiment, Fetch Hive uses a specific dataset version so the run is repeatable.

How do I create a dataset?

Open Experiments, then choose Datasets from the secondary navigation. Click Add Dataset. Enter a name and optional description. Upload a CSV file. You can click the upload area or drag the CSV into it. Review the column mapping. Fetch Hive automatically maps:

normal columns to input values
expected_output to expected output
metadata.* columns to row metadata

Click Add Dataset to create the dataset and its first version.

How do I view dataset versions?

Open Experiments, then choose Datasets. Click a dataset row to open the dataset detail page. Use the version selector in the page header to switch between versions. Versions are shown as labels like v1, v2, or v4. The page URL includes the selected version:

/experiments/datasets/:datasetId/v/:versionId

Opening a dataset without a version selects the latest version automatically. Runs always store the exact dataset version they used, so old runs remain reproducible even after newer dataset versions exist.

CSV file format

The first row must contain column headers. The CSV importer supports quoted values, commas inside quoted values, escaped quotes, multiline quoted values, and empty cells. Upload limits:

Limit	Value
File type	CSV
Maximum file size	5 MB
Maximum rows	10,000
Preview rows shown in the dialog	50

CSV columns

An experiment dataset has three kinds of columns:

input columns
one optional expected output column
optional metadata columns

Only input columns are needed to run candidates. Expected output and metadata are optional.

Input columns

Input columns are the values Fetch Hive passes into the candidate for that row. They do not have fixed names for prompt experiments. question, context, and customer_message are examples only. They are not required column names. Use column names that match the candidate you are testing.

Candidate type	What your CSV input columns should match	Example columns
Dashboard Prompt	The prompt variables in the prompt editor	`question`, `context`, `input_url`, `input_scrape`
Deployed Prompt	The prompt variables captured in the selected deployment version	`question`, `context`, `input_url`, `input_scrape`
Agent	The agent user message	`message`

At least one input column is recommended. Without an input column, the run has no row-specific input to send to the candidate. For prompt experiments, the CSV input columns should match the prompt variables. If your prompt has {{question}} and {{context}}, your CSV should include question and context. If your prompt has {{input_url}} and {{input_scrape}}, your CSV should include input_url and input_scrape instead. For agent experiments, use a message input column for the user message. If there is no message column, Fetch Hive can use exactly one non-metadata input column as the message. Rows with multiple ambiguous input columns fail so you can fix the dataset.

Expected output column

Use expected_output when you have a reference answer for the row.

Column name	Required	Stored as	Notes
`expected_output`	No	`expected_output.value`	Used for manual review today and future evaluator scoring.

Evaluator execution is not enabled yet. This means expected_output does not currently mark a result correct or incorrect automatically.

Metadata columns

Use metadata.* columns for optional row labels. Metadata columns do not trigger built-in behavior today. They do not change run order, model settings, locale, routing, or candidate execution. Fetch Hive stores them with the row so you can identify cases, filter results, compare groups, or connect results back to your own systems.

Column pattern	Required	Stored as	Example
`metadata.case_id`	No	`metadata.case_id` becomes `case_id`	`geo-001`
`metadata.topic`	No	`metadata.topic` becomes `topic`	`geography`
`metadata.priority`	No	`metadata.priority` becomes `priority`	`high`
`metadata.source`	No	`metadata.source` becomes `source`	`support_faq`
`metadata.language`	No	`metadata.language` becomes `language`	`en`

These names are examples only. You can use any metadata.* name that helps your team review results. Do not use meta_ prefixes for new datasets. Use metadata.* so the mapping is clear.

Common CSV shapes

Prompt with {{question}}:

question,expected_output,metadata.case_id,metadata.topic
"What is the capital of France?","Paris","geo-001","geography"

Prompt with {{question}} and {{context}}:

question,context,expected_output,metadata.case_id,metadata.topic
"What is the refund window?","Customers can request a refund within 30 days.","30 days","policy-001","support"

Prompt with {{input_url}} and {{input_scrape}}:

input_url,input_scrape,expected_output,metadata.case_id,metadata.topic
"https://example.com/pricing","The pricing page lists Starter, Pro, and Enterprise plans.","Summarize the three available plans.","scrape-001","pricing"

Example CSV

You can download the example CSV from the Add Dataset dialog.

question,context,expected_output,metadata.case_id,metadata.topic
"What is the capital of France?","Use only the provided context. France's capital city is Paris.","Paris","geo-001","geography"
"Who wrote Pride and Prejudice?","Jane Austen published Pride and Prejudice in 1813.","Jane Austen","lit-001","literature"
"What is 18 multiplied by 7?","Calculate the product exactly.","126","math-001","math"
"Which planet is known as the Red Planet?","Mars is often called the Red Planet because of iron oxide on its surface.","Mars","space-001","science"
"What HTTP status code means Not Found?","Common HTTP status codes include 200 OK, 404 Not Found, and 500 Internal Server Error.","404","web-001","web"
"Summarize the refund policy in one sentence.","Customers can request a refund within 30 days of purchase if they provide the original receipt.","Customers can request a refund within 30 days with the original receipt.","policy-001","support"
"Return the country code for Japan.","Use ISO 3166-1 alpha-2 country codes. Japan is JP.","JP","locale-001","localization"
"What color do you get by mixing blue and yellow?","In subtractive color mixing, blue and yellow make green.","Green","art-001","art"
"Extract the invoice total.","Invoice INV-1042 lists subtotal $90, tax $9, and total $99.","$99","invoice-001","finance"
"Classify the sentiment as positive, neutral, or negative.","The customer wrote: The setup was quick and the support team was helpful.","positive","sentiment-001","classification"

Dataset versions

Dataset versions are immutable. If you need to change rows later, create a new dataset version instead of editing a version already used by a run. This keeps old experiment runs reproducible.

Importing rows

Use Import on the dataset detail page to append rows to a dataset. Importing rows does not edit the current version. Fetch Hive creates a new immutable version that contains:

all rows from the latest dataset version
any new rows from the uploaded CSV

Duplicate rows are skipped. Duplicate detection compares the row’s input values. Row position, expected output, and metadata do not make the same input row unique during import. After import, Fetch Hive shows:

imported row count
skipped duplicate count
the new latest dataset version

Example: If v3 has 100 rows and you import a CSV with 20 rows where 5 are duplicates, Fetch Hive creates v4 with 115 rows. Runs created before the import still point to their original dataset version. New runs can use the latest version. See also: Build an experiment and Review results

​How do I create a dataset?

​How do I view dataset versions?

​CSV file format

​CSV columns

​Input columns

​Expected output column

​Metadata columns

​Common CSV shapes

​Example CSV

​Dataset versions

​Importing rows