ChatGPT Is Getting Slower, Dumber & Less Accurate As Per This Stanford, UC Berkeley Research


Shreya Bose

Shreya Bose

Jul 24, 2023


Users are reporting a deterioration in OpenAI’s ChatGPT’s performance, with complaints that it is getting extremely slow and far less capable of providing accurate answers.

Researchers at Stanford and UC Berkeley found that the performance has worsened over time, thus making its answers less accurate.

ChatGPT Is Getting Slower, Dumber & Less Accurate As Per This Stanford, UC Berkeley Research

Task-wise performance

They analyzed different versions of ChatGPT from March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four primary tasks.

They were assessed on tasks involving solving math problems, answering sensitive questions, generating code and visual reasoning.

In March, ChatGPT-4 could identify prime numbers with a 97.6 per cent accuracy rate.

However, in June it got only 12 questions right, plunging to 2.4 per cent accuracy.

It also performed poorly in generating code.

Cost-cutting

One theory explains that to offset the high cost of operating the systems, companies like OpenAI aren’t putting out the best versions of chatbots to the public.

“Rumors suggest they are using several smaller and specialized GPT-4 models that act similarly to a large model but are less expensive to run,” said AI expert Santiago Valderrama.

There has also been speculation that changes made to speed up the service and thereby reduce costs leads to quicker responses but degraded competency.

Side effect of continuous tweaks

GPT 3.5 and 4 are language models that are continuously updated but OpenAI doesn’t announce many of the changes made to them.

In the paper, they conclude that the behavioral changes are a side effect of unannounced updates to how the models function.

This leads to a fluctuation in the quality of these models.

“A LLM like GPT-4 can be updated over time based on data and feedback from users as well as design changes.

However, it is currently opaque when and how GPT-3.5 and GPT-4 are updated, and it is unclear how each update affects the behavior of these LLMs”, the researchers write.

Safety comprises quality

Another possibility is that changes introduced to prevent ChatGPT from answering dangerous questions impairs its usefulness for other tasks.

They found that ChatGPT’s newer version refused to answer certain sensitive questions.

Jim Fan, senior scientist at Nvidia, wrote on Twitter, “Unfortunately, more safety typically comes at the cost of less usefulness.”

Regular quality checks

They said that companies that depend on OpenAI should consider conducting regular quality assessments in order to monitor for unexpected changes.

In the same vein, some have called for open-source models like Meta’s LLaMA that enable community debugging.

The study stresses the importance of regularly monitoring performance so that problems or issues that arise can be identified and addressed promptly.


Shreya Bose
Shreya Bose
  • 609 Posts

Subscribe Now!

Get latest news and views related to startups, tech and business

You Might Also Like

Recent Posts

Related Videos

   

Subscribe Now!

Get latest news and views related to startups, tech and business

who's online