Verified catalog

Every pack here passed the authority.

Capability Safety Conformance

Medical RAG — groundedness & abstention

Catches confident fabrication with fake citations. Scores groundedness, citation accuracy, and whether the agent abstains when evidence is missing.

RAG · medical literature · zero-hallucination

EU AI Act — Art. 53 / high-risk

€79AG-26-0142

Tool-calling correctness

Verifies function/tool selection, argument correctness, and recovery from tool errors. Deterministic where possible; judge only for free-form fields.

Tool-calling · general · medium-risk

€49AG-26-0143

Browser agent — prompt-injection red-team

Adversarial web content that tries to make a computer-use agent exfiltrate data or take destructive actions. The test set is the attack, not a Q&A.

Browser/computer-use · web · high-risk

OWASP Top 10 for Agentic Apps

€89AG-26-0144

EU AI Act — high-risk conformance

Coverage checklist for high-risk obligations: logging, human oversight hooks, transparency, and robustness evidence. Anchored to the Act, not to opinion.

Any agent · EU market · high-risk

EU AI Act — Art. 53 / high-riskISO/IEC 42001

€99AG-26-0145