Scammers, rejoice. OpenAI’s real-time voice API can be used to build AI agents capable of conducting successful phone call scams for less than a dollar.
There have been concerns that letting AI models interact with convincing, simulated voices might lead to abuse. OpenAI in June delayed its advanced Voice Mode in ChatGPT, which supports real-time conversation between human and model, over safety concerns. This was after OpenAI demonstrated a voice that sounded like celebrity Scarlett Johansson, only to withdraw it after an outcry that the mimicry was done without her consent.
The Realtime API, released earlier this month, provides a more or less equivalent capability to third-party developers. It allows developers to pass text or audio to OpenAI’s GPT-4o model and have it respond with text, audio, or both.
Whatever safety work has been done appears to be insufficient to prevent misuse.
Researchers at the University of Illinois Urbana-Champaign (UIUC) set out to test whether the Realtime API can be used to automate phone scams.
Phone scams, explains Daniel Kang, assistant professor in the computer science department at UIUC, target as many as 17.6 million Americans annually at a cost of around $40 billion. They involve a scammer calling a victim and impersonating a company employee or government official to convince the target to reveal sensitive personal information, like bank account details or social security numbers.
Voice-enabled AI models allow this process to be automated.
“Our findings show that these agents can indeed autonomously execute the actions necessary for various phone-based scams,” said Kang.
What’s more, the cost of doing so is rather low. According to the accompanying research paper co-authored by Richard Fang, Dylan Bowman, and Daniel Kang, the average cost of a successful scam is about $0.75.
The UIUC computer scientists created AI agents capable of carrying out phone-based scams.
“Importantly, our agent design is not complicated,” Kang explained. “We implemented it in just 1,051 lines of code, with most of the code dedicated to handling real-time voice API. This simplicity aligns with prior work showing the ease of creating dual-use AI agents for tasks like cybersecurity attacks.”
The scamming agents consisted of OpenAI’s GPT-4o model, a browser automation tool called Playwright, associated code, and fraud instructions for the model. They utilized browser action functions based on Playwright like get_html
, navigate
, click_element
, fill_element
, and evaluate_javascript
, to interact with websites in conjunction with a standard jailbreaking prompt template to bypass GPT-4o safety controls.
Here’s an example of an AI agent carrying out a Bank of America scam:
This fund transfer scam required the AI agent to carry out 26 separate steps.
Various scams were tested, including bank account/crypto transfer, where the scammer hijacks a victim’s bank account/crypto account and transfers funds out; gift code exfiltration, where the scammer convinces a victim to send a gift card; and credential theft, where the scammer exfiltrates user credentials.
The success rate and cost varied. Stealing Gmail credentials had a 60 percent success rate, required five actions, took 122 seconds, and cost $0.28 in API fees. Bank account transfers had a 20 percent success rate, required 26 actions, took 183 seconds, and cost $2.51 in fees.
The average overall success rate reported was 36 percent and the average cost was $0.75. According to Kang, the failures tended to be due to AI transcription errors, though the complexity of bank site navigation also caused some problems.
Asked via email about mitigation strategies, Kang said the issue is complicated.
“Concretely, if we think of an analogy like cybersecurity, there is a whole ecosystem of techniques to reduce spam,” he said. “This is at the ISP level, the email provider level, and many others. Voice scams already cause billions in damage and we need comprehensive solutions to reduce the impact of such scams. This includes at the phone provider level (e.g., authenticated phone calls), the AI provider level (e.g., OpenAI), and at the policy/regulatory level.”
OpenAI responded to a request for comment by pointing to its terms of service. The Register understands that OpenAI’s detection systems alerted the company about the UICU researchers’ scam experiment.
Meanwhile, the biz insists it takes AI safety seriously.
“The Realtime API uses multiple layers of safety protections to mitigate the risk of API abuse, including automated monitoring and human review of flagged model inputs and outputs,” the company said in its API announcement.
“It is against our usage policies to repurpose or distribute output from our services to spam, mislead, or otherwise harm others – and we actively monitor for potential abuse. Our policies also require developers to make it clear to their users that they are interacting with AI, unless it’s obvious from the context.” ®