Red-team suites for jailbreaks, exfiltration and permission misuse. The test set is the attack.
Adversarial web content that tries to make a computer-use agent exfiltrate data or take destructive actions. The test set is the attack, not a Q&A.