Original data · discriminating power
Prompt-injection defense benchmark
Discriminating-power results for browser/computer-use agents against adversarial web content designed to hijack their instructions.
Reference panel · known quality vs. pack score
| Pack | Risk tier | Scope | Grade | Good | Broken | Sabotaged |
|---|---|---|---|---|---|---|
| Browser agent — prompt-injection red-team | High risk | 15 test cases | C | 1.00 | 0.80 | 0.40 |
Only one pack tests this method today — this table grows into a real leaderboard as more domain packs are added for it (see the Prompt-injection defense packs).
A good pack scores the known-good agent high and the sabotaged one near zero. That gap is the evidence the meter works — this is mutation testing applied to evals: does the pack catch the planted bug?