Nick Frosst, a cofounder at Cohere who previously worked on AI at Google, says Altman’s feeling that going bigger will not work indefinitely rings true. He, too, believes that progress on transformers, the type of machine learning model at the heart of GPT-4 and its rivals, lies beyond scaling. “There are lots of ways of making transformers way, way better and more useful, and lots of them don’t involve adding parameters to the model,” he says. Frosst says that new AI model designs, or architectures, and further tuning based on human feedback are promising directions that many researchers are already exploring.
Each version of OpenAI’s influential family of language algorithms consists of an artificial neural network, software loosely inspired by the way neurons work together, which is trained to predict the words that should follow a given string of text.
The first of these language models, GPT-2, was announced in 2019. In its largest form, it had 1.5 billion parameters, a measure of the number of adjustable connections between its crude artificial neurons.
At the time, that was extremely large compared to previous systems, thanks in part to OpenAI researchers finding that scaling up made the model more coherent. And the company made GPT-2’s successor, GPT-3, announced in 2020, still bigger, with a whopping 175 billion parameters. That system’s broad abilities to generate poems, emails, and other text helped convince other companies and research institutions to push their own AI models to similar and even greater size.
After ChatGPT debuted in November, meme makers and tech pundits speculated that GPT-4, when it arrived, would be a model of vertigo-inducing size and complexity. Yet when OpenAI finally announced the new artificial intelligence model, the company didn’t disclose how big it is—perhaps because size is no longer all that matters. At the MIT event, Altman was asked if training GPT-4 cost $100 million; he replied, “It’s more than that.”
Although OpenAI is keeping GPT-4’s size and inner workings secret, it is likely that some of its intelligence already comes from looking beyond just scale. On possibility is that it used a method called reinforcement learning with human feedback, which was used to enhance ChatGPT. It involves having humans judge the quality of the model’s answers to steer it towards providing responses more likely to be judged as high quality.
The remarkable capabilities of GPT-4 have stunned some experts and sparked debate over the potential for AI to transform the economy but also spread disinformation and eliminate jobs. Some AI experts, tech entrepreneurs including Elon Musk, and scientists recently wrote an open letter calling for a six-month pause on the development of anything more powerful than GPT-4.
At MIT last week, Altman confirmed that his company is not currently developing GPT-5. “An earlier version of the letter claimed OpenAI is training GPT-5 right now,” he said. “We are not, and won’t for some time.”