ChatGPT's Mathematical Accuracy Falls To Shocking 2%; Response Quality Deteriorates


Radhika Kajarekar

Radhika Kajarekar

Jul 25, 2023


In recent times, there has been a growing number of reports and discussions about a decline in the quality of responses from ChatGPT. To investigate this matter, a team of researchers from Stanford and UC Berkeley conducted a study to quantify the extent of this degradation. The study confirmed that the drop in ChatGPT’s quality was indeed real.

ChatGPT's Mathematical Accuracy Falls To Shocking 2%; Response Quality Deteriorates

The research paper titled “How Is ChatGPT’s Behavior Changing Over Time?” was authored by three prominent academics: Matei Zaharia, Lingjiao Chen, and James Zou. Matei Zaharia, who is a Computer Science Professor at UC Berkeley, shared the findings on Twitter, revealing a startling fact that GPT-4’s success rate in solving certain problems fell drastically from 97.6% to 2.4% between March and June.

GPT-4, which was recently released and acclaimed as OpenAI’s most advanced model, had been eagerly anticipated by developers for its potential to power innovative AI products. However, the study’s results showed disappointing performance, especially in handling straightforward queries.

The research team designed tasks to evaluate the quality of responses from the large language models (LLMs) GPT-4 and GPT-3.5. These tasks covered areas such as solving math problems, answering sensitive questions, code generation, and visual reasoning. The chart provided an overview of the performance of both models across their March and June releases in 2023.

The data clearly illustrated that the same LLM service provided different answers over time, showing significant differences in performance within this short period. It remains uncertain how these LLMs are updated and whether changes to improve one aspect of their performance might negatively affect others. Notably, the latest version of GPT-4 performed worse compared to the March version in three testing categories, with only a slight margin of improvement in visual reasoning.

While some may not be concerned about the variable quality in the “same versions” of these LLMs, it is crucial to acknowledge that both GPT-4 and GPT-3.5 have been widely adopted by individual users and businesses due to the popularity of ChatGPT. As such, information generated by these models can significantly impact people’s lives.

The researchers intend to continue assessing GPT versions in a more extended study. They suggest that OpenAI should consider monitoring and publishing regular quality checks for its paying customers. If not, it may be necessary for business or governmental organizations to keep an eye on basic quality metrics for these LLMs to avoid potential commercial and research impacts.

The AI and LLM technology domain has had its share of surprising issues, and with data privacy concerns and other public relations challenges, it currently seems like the “wild west” frontier of connected life and commerce.


Radhika Kajarekar
Radhika Kajarekar
  • 755 Posts

Subscribe Now!

Get latest news and views related to startups, tech and business

You Might Also Like

Technology
Sep. 8, 2022

Samsung Launches The Wall All-In-One and Flip Pro: Is This The Future Of Display Technology?

Samsung has launched The Wall All-In-One – the modular MicroLED it says is revolutionizing the future of display and the Flip Pro, which is an interactive display. Both were unveiled at the InfoComm India 2022 which is India’s Professional AudioVisual (Pro AV) and Systems Integration Technology Exhibition. This took place in Mumbai from September 5-7. […]

Technology
Jul. 28, 2022

Google Street View Launches In India Across These 10 Indians Cities! Plans To Expand To 700,000 Kms, 50 Cities In 2 Years

Google’s Street View is finally available in India a decade after it was prevented from capturing data for its Street View services. Second coming Street view offers a 360-degree interactive panorama feature initially for 10 Indian cities with data from local partners Tech Mahindra and Mumbai-based Genesis International. Its entry into India is facilitated by […]

Technology
Jul. 10, 2022

This Electricity-Free Cooler Developed By IIT Researchers Can Replace Air Conditioners! How It Work?

Indian Institute of Technology Guwahati researchers have built a ‘Radiative Cooler’ which does not require electricity to operate. This is an affordable and efficient ‘passive’ radiative cooling system that can serve as an alternative to ACs. The coating material is an electricity-free cooling system that can be applied in the rooftops and functions during both […]

Technology
Jun. 27, 2022

This New Battery Will Enable Electric Cars To Run 1000 Kms In Single Charge! Will This Transform Electric Mobility?

Chinese company Contemporary Amperex Technology Co Ltd (CATL) has developed a battery that gives EVs a driving range of over 1,000 km or 621 miles on a single charge. About the company The company is China’s leading automotive lithium-ion battery maker. Tesla, Volkswagen, BMW, and Nio are among the companies for which CATL supplies batteries. […]

Recent Posts

Related Videos

   

Subscribe Now!

Get latest news and views related to startups, tech and business

who's online