Artificial Intelligence (AI) chatbots, such as OpenAI’s ChatGPT, have been hailed as a major breakthrough in technology. These chatbots, known for their ability to engage in fluid conversations, have garnered attention and sparked a global race to develop more advanced AI models. However, recent findings have revealed an unexpected development: ChatGPT’s diminishing proficiency in basic math.
The phenomenon of AI ‘drift’ has caught the attention of the academic community. It refers to the unintended consequences of model optimization. As researchers strive to enhance certain functionalities of AI models, other areas may suffer. In the case of ChatGPT, researchers from Stanford University and the University of California, Berkeley, have found that its proficiency in basic math has regressed while attempting to improve in other areas.
A careful analysis of ChatGPT’s capabilities conducted by Stanford Ph.D. student Lingjiao Chen and Berkeley researcher Matei Zaharia showed astonishing results. In a test conducted in March, the advanced GPT-4 version of ChatGPT correctly identified the primality of 84% of the presented numbers. However, in the following months, its accuracy dropped to just 51%. This decline in performance was observed in six out of eight diverse tasks.
The decline in ChatGPT’s abilities is not limited to mathematics. It also showed a significant decrease in its responsiveness to opinion-centric queries, dropping from a 98% response rate in March to 23% in June. This regression may be linked to the practice of ‘prompt engineering,’ where users create specific prompts to elicit desired AI responses. The measures taken to counteract manipulative prompts may have inadvertently affected ChatGPT’s mathematical prowess.
Despite these challenges, the consensus among the research community is not to abandon the technology but to approach it with vigilance. Researchers are advocating for more rigorous monitoring and testing of AI models over time to better understand their evolution. The journey of understanding and refining AI systems is still ongoing, and it is clear that there is much more to explore in the field of AI.