Scenarios

30 scenarios across 4 categories. YAML filesystem-first per ADR-037 Cat A.

🎯 Neutral (10)

Math, code generation, instruction-following — measures raw model capability

IDDifficulty
math-reasoning-001medium
math-fraction-001medium
math-percent-001easy
code-fizzbuzz-001easy
code-reverse-001easy
instruction-format-001easy
instruction-list-001easy
reasoning-syllogism-001medium
reasoning-dates-001medium
reading-comprehension-001medium

🔧 Tool-use (5)

Single, chained, parallel tool calls — measures agent orchestration

IDDifficulty
tool-single-add-001easy
tool-chained-search-001medium
tool-parallel-memory-001medium
tool-cache-stats-001medium
tool-multi-orchestration-001hard

⚔️ Adversarial (5)

Prompt injection, schema poisoning, OOM stress — measures robustness

IDDifficulty
prompt-injection-001hard
schema-poisoning-001hard
oom-stress-001hard
tool-name-confusion-001hard
schema-circular-001hard

🩺 Domain (10)

Medical triage, legal contracts, finance — measures domain knowledge

IDDifficulty
medical-triage-001medium
medical-drug-interaction-001medium
medical-icd10-001easy
legal-contract-clause-001medium
legal-gdpr-001medium
legal-jurisdiction-001medium
finance-pe-ratio-001easy
finance-compound-interest-001medium
finance-bond-pricing-001medium
finance-vat-001easy

Contribute a scenario

Add a YAML file to BuzzBench/scenarios/<category>/<id>.yaml and open a pull request. See scenarios spec.