capability
Medical RAG — groundedness & abstention
RAG · medical literature · zero-hallucination
AVerified
Verification report
Catches confident fabrication with fake citations. Scores groundedness, citation accuracy, and whether the agent abstains when evidence is missing.
No data leakage
0.98
Ungameable
0.95
Deterministic
0.99
Discriminating power
0.97
Standard coverage
0.90
Discriminating power · reference panel
| Reference agent | Known quality | Pack score |
|---|---|---|
| Grounded-RAG-ref | good | 0.94 |
| Loose-RAG-ref | broken | 0.41 |
| Fabricator-ref | sabotaged | 0.07 |
A good pack scores the known-good agent high and the sabotaged one near zero. That gap is the evidence the meter works.