Experiment evaluators

Evaluator execution is not enabled in the current experiments release. Datasets can already store expected_output, but Fetch Hive does not yet run automatic scoring or pass/fail checks from it.

Current behavior

When you upload a dataset, expected_output is stored with each row. Use it during manual review and row comparison. No exact-match evaluator runs automatically today.

Planned evaluator types

Future evaluator support may include:

Evaluator type	Use case
Exact match	Strictly compare output with expected output
Contains	Check whether output includes required text
Regex	Check output against a pattern
JSON field match	Compare specific fields in structured output
Schema validation	Confirm output follows a required JSON schema
LLM judge	Score semantic correctness, reasoning quality, instruction following, or task completion
Custom evaluator	Run workspace-defined evaluation logic

How should I prepare datasets for evaluators?

Add expected_output when you have a known answer. Use metadata.* columns to group rows by topic, priority, source, language, or case id. Keep expected outputs concise when you expect exact or contains checks. Use structured JSON in expected_output when future field-level checks will be useful. Example:

question,expected_output,metadata.case_id,metadata.topic
"Return the country code for Japan.","JP","locale-001","localization"

​Current behavior

​Planned evaluator types

​How should I prepare datasets for evaluators?

Current behavior

Planned evaluator types

How should I prepare datasets for evaluators?