Skip to main content
Use Datasets to store reusable test cases for experiments. Each dataset can have one or more immutable versions. When you run an experiment, Fetch Hive uses a specific dataset version so the run is repeatable.

How do I create a dataset?

Open Experiments, then choose Datasets from the secondary navigation. Click Add Dataset. Enter a name and optional description. Upload a CSV file. You can click the upload area or drag the CSV into it. Review the column mapping. Fetch Hive automatically maps:
  • normal columns to input values
  • expected_output to expected output
  • metadata.* columns to row metadata
Click Add Dataset to create the dataset and its first version.

How do I view dataset versions?

Open Experiments, then choose Datasets. Click a dataset row to open the dataset detail page. Use the version selector in the page header to switch between versions. Versions are shown as labels like v1, v2, or v4. The page URL includes the selected version:
/experiments/datasets/:datasetId/v/:versionId
Opening a dataset without a version selects the latest version automatically. Runs always store the exact dataset version they used, so old runs remain reproducible even after newer dataset versions exist.

CSV file format

The first row must contain column headers. The CSV importer supports quoted values, commas inside quoted values, escaped quotes, multiline quoted values, and empty cells. Upload limits:
LimitValue
File typeCSV
Maximum file size5 MB
Maximum rows10,000
Preview rows shown in the dialog50

CSV columns

An experiment dataset has three kinds of columns:
  • input columns
  • one optional expected output column
  • optional metadata columns
Only input columns are needed to run candidates. Expected output and metadata are optional.

Input columns

Input columns are the values Fetch Hive passes into the prompt or agent for that row. They do not have fixed names for prompt experiments. question, context, and customer_message are examples only. They are not required column names. Use column names that match the candidate you are testing.
Candidate typeWhat your CSV input columns should matchExample columns
Dashboard PromptThe prompt variables in the prompt editorquestion, context, input_url, input_scrape
Deployed PromptThe prompt variables captured in the selected deployment versionquestion, context, input_url, input_scrape
AgentThe agent’s user message inputmessage
At least one input column is recommended. Without an input column, the run has no row-specific input to send to the candidate. For prompt experiments, the CSV input columns should match the prompt variables. If your prompt has {{question}} and {{context}}, your CSV should include question and context. If your prompt has {{input_url}} and {{input_scrape}}, your CSV should include input_url and input_scrape instead. For agent experiments, use message as the main input column. Agents start from a user message, so message is the clearest dataset shape.
message,expected_output,metadata.case_id,metadata.topic
"Find the pricing page for Acme and summarize the plans.","A concise pricing summary","agent-001","research"

Expected output column

Use expected_output when you have a reference answer for the row.
Column nameRequiredStored asNotes
expected_outputNoexpected_output.valueUsed for manual review today and future evaluator scoring.
Evaluator execution is not enabled yet. This means expected_output does not currently mark a result correct or incorrect automatically.

Metadata columns

Use metadata.* columns for optional row labels. Metadata columns do not trigger built-in behavior today. They do not change run order, model settings, locale, routing, or candidate execution. Fetch Hive stores them with the row so you can identify cases, filter results, compare groups, or connect results back to your own systems.
Column patternRequiredStored asExample
metadata.case_idNometadata.case_id becomes case_idgeo-001
metadata.topicNometadata.topic becomes topicgeography
metadata.priorityNometadata.priority becomes priorityhigh
metadata.sourceNometadata.source becomes sourcesupport_faq
metadata.languageNometadata.language becomes languageen
These names are examples only. You can use any metadata.* name that helps your team review results. Do not use meta_ prefixes for new datasets. Use metadata.* so the mapping is clear.

Common CSV shapes

Prompt with {{question}}:
question,expected_output,metadata.case_id,metadata.topic
"What is the capital of France?","Paris","geo-001","geography"
Prompt with {{question}} and {{context}}:
question,context,expected_output,metadata.case_id,metadata.topic
"What is the refund window?","Customers can request a refund within 30 days.","30 days","policy-001","support"
Prompt with {{input_url}} and {{input_scrape}}:
input_url,input_scrape,expected_output,metadata.case_id,metadata.topic
"https://example.com/pricing","The pricing page lists Starter, Pro, and Enterprise plans.","Summarize the three available plans.","scrape-001","pricing"
Agent message:
message,expected_output,metadata.case_id,metadata.topic
"Classify this customer message: The setup was quick and support was helpful.","positive","sentiment-001","classification"

Example CSV

You can download the example CSV from the Add Dataset dialog.
question,context,expected_output,metadata.case_id,metadata.topic
"What is the capital of France?","Use only the provided context. France's capital city is Paris.","Paris","geo-001","geography"
"Who wrote Pride and Prejudice?","Jane Austen published Pride and Prejudice in 1813.","Jane Austen","lit-001","literature"
"What is 18 multiplied by 7?","Calculate the product exactly.","126","math-001","math"
"Which planet is known as the Red Planet?","Mars is often called the Red Planet because of iron oxide on its surface.","Mars","space-001","science"
"What HTTP status code means Not Found?","Common HTTP status codes include 200 OK, 404 Not Found, and 500 Internal Server Error.","404","web-001","web"
"Summarize the refund policy in one sentence.","Customers can request a refund within 30 days of purchase if they provide the original receipt.","Customers can request a refund within 30 days with the original receipt.","policy-001","support"
"Return the country code for Japan.","Use ISO 3166-1 alpha-2 country codes. Japan is JP.","JP","locale-001","localization"
"What color do you get by mixing blue and yellow?","In subtractive color mixing, blue and yellow make green.","Green","art-001","art"
"Extract the invoice total.","Invoice INV-1042 lists subtotal $90, tax $9, and total $99.","$99","invoice-001","finance"
"Classify the sentiment as positive, neutral, or negative.","The customer wrote: The setup was quick and the support team was helpful.","positive","sentiment-001","classification"

Dataset versions

Dataset versions are immutable. If you need to change rows later, create a new dataset version instead of editing a version already used by a run. This keeps old experiment runs reproducible.

Importing rows

Use Import on the dataset detail page to append rows to a dataset. Importing rows does not edit the current version. Fetch Hive creates a new immutable version that contains:
  • all rows from the latest dataset version
  • any new rows from the uploaded CSV
Duplicate rows are skipped. Duplicate detection compares the row’s input values. Row position, expected output, and metadata do not make the same input row unique during import. After import, Fetch Hive shows:
  • imported row count
  • skipped duplicate count
  • the new latest dataset version
Example: If v3 has 100 rows and you import a CSV with 20 rows where 5 are duplicates, Fetch Hive creates v4 with 115 rows. Runs created before the import still point to their original dataset version. New runs can use the latest version. See also: Build an experiment and Review results