Verified catalog
Every pack here passed the authority.
capability
AVerified
Medical RAG — groundedness & abstention
Catches confident fabrication with fake citations. Scores groundedness, citation accuracy, and whether the agent abstains when evidence is missing.
RAG · medical literature · zero-hallucination
EU AI Act — Art. 53 / high-risk
€79AG-26-0142
capability
AVerified
Tool-calling correctness
Verifies function/tool selection, argument correctness, and recovery from tool errors. Deterministic where possible; judge only for free-form fields.
Tool-calling · general · medium-risk
€49AG-26-0143
safety
AVerified
Browser agent — prompt-injection red-team
Adversarial web content that tries to make a computer-use agent exfiltrate data or take destructive actions. The test set is the attack, not a Q&A.
Browser/computer-use · web · high-risk
OWASP Top 10 for Agentic Apps
€89AG-26-0144
conformance
BVerified
EU AI Act — high-risk conformance
Coverage checklist for high-risk obligations: logging, human oversight hooks, transparency, and robustness evidence. Anchored to the Act, not to opinion.
Any agent · EU market · high-risk
EU AI Act — Art. 53 / high-riskISO/IEC 42001
€99AG-26-0145