Predictive and generative AI systems remain vulnerable to a variety of attacks and anyone who says otherwise isn’t being entirely honest, according to Apostol Vassilev, a computer scientist with the US National Institute of Standards and Technology (NIST).
“Despite the significant progress AI and machine learning have made, these technologies are vulnerable to attacks that can cause spectacular failures with dire consequences,” he said.
“There are theoretical problems with securing AI algorithms that simply haven’t been solved yet. If anyone says differently, they are selling snake oil.”
Vassilev coautored a paper on the topic with Alina Oprea (Northeastern University), and Alie Fordyce and Hyrum Anderson from security shop Robust Intelligence, that attempts to categorize the security risks posed by AI systems. Overall, the results don’t look good.
The paper [PDF], titled, “Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations,” follows from the NIST Trustworthy AI initiative, which reflects broader US government goals to ensure AI safety. It explores various adversarial machine learning techniques based on industry research over the past few decades.
The researchers have focused on four specific security concerns: evasion, poisoning, privacy and abuse attacks, which can apply to predictive (e.g. object recognition) or generative (e.g. ChatGPT) models.
“In an evasion attack, the adversary’s goal is to generate adversarial examples, which are defined as testing samples whose classification can be changed at deployment time to an arbitrary class of the attacker’s choice with only minimal perturbation,” the paper explains, tracing the technique back to research from 1988.
As an example, NIST points to techniques through which stop signs can be marked in ways that make computer vision systems in autonomous vehicles misidentify them.
Then there are poisoning attacks in which unwanted data gets added to the training of a machine learning model and makes the model respond in an undesirable way, generally after receiving a specific input. The paper points to a 2020 Microsoft research paper that says poisoning attacks are what most concerns organizations surveyed about adversarial machine learning.
“Poisoning attacks, for example, can be mounted by controlling a few dozen training samples, which would be a very small percentage of the entire training set,” Oprea opined.
Privacy attacks, which involve the reconstruction of training data that should otherwise be inaccessible, the extraction of memorized data, making inferences about protected data, and related intrusions, are also relatively simple to carry out.
Finally, there are abuse attacks, which involve repurposing generative AI systems to serve the attacker’s ends. “Attackers can use the capabilities of GenAI models to promote hate speech or discrimination, generate media that incites violence against specific groups, or scale offensive cybersecurity operations by creating images, text, or malicious code that enable a cyber attack,” the paper explains.
The authors’ goal in listing these various attack categories and variations is to suggest mitigation methods, to help AI practitioners understand the concerns that need to be addressed when models are trained and deployed, and to promote the development of better defenses.
The paper concludes by observing that trustworthy AI currently entails a tradeoff between security on the one hand and fairness and accuracy on the other.
“AI systems optimized for accuracy alone tend to underperform in terms of adversarial robustness and fairness,” it concludes. “Conversely, an AI system optimized for adversarial robustness may exhibit lower accuracy and deteriorated fairness outcomes.” ®