Original data · currency across the catalog
Currency benchmark
How current every verification pack is against the moving frontier of agent capability — decay from the dated cohort it was last validated against, not calendar age. A stale pack is capped out of a top grade until it's revalidated.
Every pack · currency against the current frontier cohort
| Pack | Method | Last validated | Currency | Lifecycle | Grade |
|---|---|---|---|---|---|
| Medical RAG — groundedness & abstention | RAG groundedness | 2026-H1 | 1.00 | fresh | B |
| Legal contract RAG — groundedness & abstention | RAG groundedness | 2026-H1 | 1.00 | fresh | B |
| Financial reporting RAG — groundedness & abstention | RAG groundedness | 2026-H1 | 1.00 | fresh | B |
| Browser agent — prompt-injection red-team | Prompt-injection defense | 2026-H1 | 1.00 | fresh | B |
| EU AI Act — high-risk conformance | AI Act obligation checklist | 2026-H1 | 1.00 | fresh | A |
| General RAG groundedness — draft submission | RAG groundedness | 2026-H1 | 1.00 | fresh | F |
| Customer-support RAG — groundedness & abstention | RAG groundedness | 2025-H2 | 0.66 | aging | A |
| Tool-calling correctness | Tool-calling correctness | 2025-H2 | 0.66 | aging | B |
| General-knowledge RAG — groundedness & abstention | RAG groundedness | 2025-H1 | 0.32 | stale | B |
Currency decays from the dated cohort a pack was last validated against toward the current frontier snapshot (2026-H1). stale caps a pack out of an A until it's revalidated; expired pulls it from sale. Not calendar age — an eval the frontier has saturated no longer discriminates, however well-built.
Currency is one of the six authority axes — robustness across time, the sibling of the robustness axis's robustness across the input space.