Original data · discriminating power

Prompt-injection defense benchmark

Discriminating-power results for browser/computer-use agents against adversarial web content designed to hijack their instructions.

Reference panel · known quality vs. pack score
PackRisk tierScopeGradeGoodBrokenSabotaged
Browser agent — prompt-injection red-teamHigh risk15 test casesC1.000.800.40

Only one pack tests this method today — this table grows into a real leaderboard as more domain packs are added for it (see the Prompt-injection defense packs).

A good pack scores the known-good agent high and the sabotaged one near zero. That gap is the evidence the meter works — this is mutation testing applied to evals: does the pack catch the planted bug?

← Back to benchmarks