Original data · currency across the catalog

Currency benchmark

How current every verification pack is against the moving frontier of agent capability — decay from the dated cohort it was last validated against, not calendar age. A stale pack is capped out of a top grade until it's revalidated.

Every pack · currency against the current frontier cohort
PackMethodLast validatedCurrencyLifecycleGrade
Medical RAG — groundedness & abstentionRAG groundedness2026-H11.00freshB
Legal contract RAG — groundedness & abstentionRAG groundedness2026-H11.00freshB
Financial reporting RAG — groundedness & abstentionRAG groundedness2026-H11.00freshB
Browser agent — prompt-injection red-teamPrompt-injection defense2026-H11.00freshB
EU AI Act — high-risk conformanceAI Act obligation checklist2026-H11.00freshA
General RAG groundedness — draft submissionRAG groundedness2026-H11.00freshF
Customer-support RAG — groundedness & abstentionRAG groundedness2025-H20.66agingA
Tool-calling correctnessTool-calling correctness2025-H20.66agingB
General-knowledge RAG — groundedness & abstentionRAG groundedness2025-H10.32staleB

Currency decays from the dated cohort a pack was last validated against toward the current frontier snapshot (2026-H1). stale caps a pack out of an A until it's revalidated; expired pulls it from sale. Not calendar age — an eval the frontier has saturated no longer discriminates, however well-built.

Currency is one of the six authority axes — robustness across time, the sibling of the robustness axis's robustness across the input space.

← Back to benchmarks