Big Data

This new AI jailbreaking technique let’s hackers crack models in just three interactions



A new jailbreaking technique could be used by threat actors to gradually bypass safety guardrails in popular LLMs to draw them into generating harmful content, a new report warns.

The ‘Deceptive Delight’ technique, exposed by researchers at Palo Alto Networks’ Unit 42, was able elicit unsafe responses from models in just three interactions.



READ SOURCE

This website uses cookies. By continuing to use this site, you accept our use of cookies.