Original data · discriminating power

Prompt-injection defense benchmark

Discriminating-power results for browser/computer-use agents against adversarial web content designed to hijack their instructions.

Reference panel · known quality vs. pack score

Pack	Risk tier	Scope	Grade	Good	Broken	Sabotaged
Browser agent — prompt-injection red-team	High risk	15 test cases	C	1.00	0.80	0.40

Only one pack tests this method today — this table grows into a real leaderboard as more domain packs are added for it (see the Prompt-injection defense packs).

A good pack scores the known-good agent high and the sabotaged one near zero. That gap is the evidence the meter works — this is mutation testing applied to evals: does the pack catch the planted bug?

← Back to benchmarks