The citeable asset

Benchmarks

Original discriminating-power data: how our verification packs score agents whose quality we already know.

Discriminating-power results across known-good, broken and sabotaged medical-RAG reference agents.

Discriminating-power results across known-good, broken and sabotaged reference agents on legal contract Q&A.

Discriminating-power results across known-good, broken and sabotaged reference agents on financial-reporting Q&A.

Discriminating-power results across known-good, broken and sabotaged reference agents on customer-support Q&A.

Discriminating-power results across known-good, broken and sabotaged reference agents on general-reference Q&A — the control cell for this method.