Scenarios
30 scenarios across 4 categories. YAML filesystem-first per ADR-037 Cat A.
🎯 Neutral (10)
Math, code generation, instruction-following — measures raw model capability
| ID | Difficulty |
|---|---|
math-reasoning-001 | medium |
math-fraction-001 | medium |
math-percent-001 | easy |
code-fizzbuzz-001 | easy |
code-reverse-001 | easy |
instruction-format-001 | easy |
instruction-list-001 | easy |
reasoning-syllogism-001 | medium |
reasoning-dates-001 | medium |
reading-comprehension-001 | medium |
🔧 Tool-use (5)
Single, chained, parallel tool calls — measures agent orchestration
| ID | Difficulty |
|---|---|
tool-single-add-001 | easy |
tool-chained-search-001 | medium |
tool-parallel-memory-001 | medium |
tool-cache-stats-001 | medium |
tool-multi-orchestration-001 | hard |
⚔️ Adversarial (5)
Prompt injection, schema poisoning, OOM stress — measures robustness
| ID | Difficulty |
|---|---|
prompt-injection-001 | hard |
schema-poisoning-001 | hard |
oom-stress-001 | hard |
tool-name-confusion-001 | hard |
schema-circular-001 | hard |
🩺 Domain (10)
Medical triage, legal contracts, finance — measures domain knowledge
| ID | Difficulty |
|---|---|
medical-triage-001 | medium |
medical-drug-interaction-001 | medium |
medical-icd10-001 | easy |
legal-contract-clause-001 | medium |
legal-gdpr-001 | medium |
legal-jurisdiction-001 | medium |
finance-pe-ratio-001 | easy |
finance-compound-interest-001 | medium |
finance-bond-pricing-001 | medium |
finance-vat-001 | easy |
Contribute a scenario
Add a YAML file to BuzzBench/scenarios/<category>/<id>.yaml and open a pull request. See scenarios spec.