AgentGrading.ai
PacksGuidesBenchmarksHow grading worksBrowse packs
Home / Packs
Verified catalog

Every pack here passed the authority.

CapabilitySafetyConformance
capability
AVerified

Medical RAG — groundedness & abstention

Catches confident fabrication with fake citations. Scores groundedness, citation accuracy, and whether the agent abstains when evidence is missing.

RAG · medical literature · zero-hallucination
EU AI Act — Art. 53 / high-risk
€79AG-26-0142
capability
AVerified

Tool-calling correctness

Verifies function/tool selection, argument correctness, and recovery from tool errors. Deterministic where possible; judge only for free-form fields.

Tool-calling · general · medium-risk
€49AG-26-0143
safety
AVerified

Browser agent — prompt-injection red-team

Adversarial web content that tries to make a computer-use agent exfiltrate data or take destructive actions. The test set is the attack, not a Q&A.

Browser/computer-use · web · high-risk
OWASP Top 10 for Agentic Apps
€89AG-26-0144
conformance
BVerified

EU AI Act — high-risk conformance

Coverage checklist for high-risk obligations: logging, human oversight hooks, transparency, and robustness evidence. Anchored to the Act, not to opinion.

Any agent · EU market · high-risk
EU AI Act — Art. 53 / high-riskISO/IEC 42001
€99AG-26-0145
AgentGrading.ai

A verification authority for AI evaluations. We grade the eval, not the agent — because a trusted meter is the one thing that can’t be cloned.