The citeable asset
Benchmarks
Original discriminating-power data: how our verification packs score agents whose quality we already know.
RAG agents benchmark
Discriminating-power results across known-good, broken and sabotaged medical-RAG reference agents.
Legal contract RAG agents benchmark
Discriminating-power results across known-good, broken and sabotaged reference agents on legal contract Q&A.
Financial reporting RAG agents benchmark
Discriminating-power results across known-good, broken and sabotaged reference agents on financial-reporting Q&A.
Customer-support RAG agents benchmark
Discriminating-power results across known-good, broken and sabotaged reference agents on customer-support Q&A.
General-knowledge RAG agents benchmark
Discriminating-power results across known-good, broken and sabotaged reference agents on general-reference Q&A — the control cell for this method.