Skip to main content
Evaluator execution is not enabled in the current experiments release. Datasets can already store expected_output, but Fetch Hive does not yet run automatic scoring or pass/fail checks from it.

Current behavior

When you upload a dataset, expected_output is stored with each row. Use it during manual review and row comparison. No exact-match evaluator runs automatically today.

Planned evaluator types

Future evaluator support may include:
Evaluator typeUse case
Exact matchStrictly compare output with expected output
ContainsCheck whether output includes required text
RegexCheck output against a pattern
JSON field matchCompare specific fields in structured output
Schema validationConfirm output follows a required JSON schema
LLM judgeScore semantic correctness, reasoning quality, instruction following, or task completion
Custom evaluatorRun workspace-defined evaluation logic

How should I prepare datasets for evaluators?

Add expected_output when you have a known answer. Use metadata.* columns to group rows by topic, priority, source, language, or case id. Keep expected outputs concise when you expect exact or contains checks. Use structured JSON in expected_output when future field-level checks will be useful. Example:
question,expected_output,metadata.case_id,metadata.topic
"Return the country code for Japan.","JP","locale-001","localization"
See also: Datasets and Review results