- The July 18, 2023 study shows that ChatGPT- 3.5 and GPT-4 models provided inaccurate answers to identical series of questions asked a few months prior.
- “We evaluated ChatGPT’s behaviour over time and found substantial differences in its responses to the same questions between the June versions of GPT4 and GPT3.5 and the March versions”
- As per the research, GPT-4 could identify prime numbers with a 97.6 per cent accuracy in June, as compared to 2.4 per cent in March
ChatGPT came into the scene in November last year, amassing over 1 million users five days after its launch. By the end of January, the AI chatbox had hit over 100 million users, and the future looked bright for OpenAI and AI potentialities. However, ChatGPT’s, and the later GPT-4 human-like conversation capabilities have raised an alarm. Things were going fast that the EU parliament, together with other governments across the globe had to call for a pause and review in AI development.
Cancel culture on ChatGPT as accuracy dips
The AI chatbox’s fame is washing away. In a study conducted by Stanford and UC Berkeley researchers, the newest models are losing their grip on accuracy. The July 18, 2023 study shows that ChatGPT- 3.5 and GPT-4 models provided inaccurate answers to identical series of questions asked a few months prior.
The researchers included Lingjiao Chen, Matei Zaharia, and James Zou, providing questions about solving math problems, answering sensitive/ dangerous questions, writing codes, and spatial reasoning from prompts.
We evaluated ChatGPT’s behavior over time and found substantial diffs in its responses to the *same questions* between the June versions of GPT4 and GPT3.5 and the March versions. The newer versions got worse on some tasks. James Zou said on Twitter, providing samples as shown.
Lots of people are wondering whether GPT4 and ChatGPT’s performance has been changing over time, so Lingjiao Chen, James Zou and I measured it. We found big changes including some large decreases in some problem-solving tasks: Matei Zaharia.
As per the research, GPT-4 could identify prime numbers with a 97.6 per cent accuracy in June, as compared to 2.4 per cent in March. Additionally, GPT-4 was more hesitant to answer sensitive questions in June than in March. Both ChatGPT-4 and ChatGPT-3.5 had more formatting mistakes while generating code in June than in March.
It remains unclear whether the inaccuracies are a result of the EU crackdown or from the incompatibility of the Large Language Models to consistent updates.
OpenAI selects a team to manage superintelligent AI developments
OpenAI is open to the fact that Artificial Intelligent developments need control. In their blog, the company declared they will appoint a team to manage these developments.
We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20 per cent of the compute we have secured to date to this effort. We’re looking for excellent Machine Learning researchers and engineers to join us.
As much as superintelligence is an impactful technology and could help solve many of the world’s dire problems, there needs to be a regulatory framework to limit its influence.