What Is GPT-3 and Why Is It Talking to Us?
Silicon Valley is buzzing because of GPT-3, which is the newest piece of research from the Elon Musk-backed research institute OpenAI.
Description: A rendering of some NVIDIA Graphics Processing Units (GPUs). Chips like these are used to quickly train large neural networks. Photo Credit: Nana Dua.
Silicon Valley is buzzing because of GPT-3, which is the newest piece of research from the Elon Musk-backed research institute OpenAI.
GPT-3 is the largest and latest in a long line of machine learning models that generate language. Internally, it works much like your phone keyboard's autocorrect--predicting the next words after a sequence of past words.
But unlike your phone's autocorrect, GPT-3 can spill out fully-coherent passages of text that seemingly have never been written before. The technology behind GPT-3 powers many advances in natural language processing--everything from language translation to document summarization.
Can GPT-3 Really Write Anything?
Well, people have been trying a lot of different things.
Gwern Branwen used GPT-3 to generate creative fiction. He also wrote a technical overview of GPT-3.
Kevin Lacker attempted to perform a variant of the Turing Test to GPT-3--asking questions designed to reveal whether the counterpart is human or a machine.
Andrew Mayne performed various experiments with GPT-3, including around its ability to summarize text.
The entrepreneur Arthur Breitman also tried to give another Turing Test. Then, Eliezer Yudkowsky--a researcher known for his theories around artificial intelligence--saw the results and remarked that GPT-3 might be "hiding" the full extent of its intelligence.
McKay Wrigley built an app that asks GPT-3 to deliver responses in the style of a specific person--like, say, Albert Einstein. Here is a screenshot of me asking McKay's app about how to write a hit song.
Jordan Singer used GPT-3 to create a plugin for Figma--an online visual design tool--that can design seemingly anything you describe. Jonathan Lee discusses Jordan's tool in greater depth.
Sharif Shameem developed a GPT-3 powered layout generator for web applications.
OpenAI employee Amanda Askell demonstrated the ability for the ability of GPT-3 to write music.
People have also been trying to get GPT-3 to write software. Florent Crivello highlights this below:
Faraaz Nishtar was one of the few people that found that GPT-3 can write a small amount of SQL--a programming language used to query databases.
The startup OthersideAI demoed a feature that uses GPT-3 to generate responses to emails. Gmail's Smart Reply feature uses a similar model to GPT-3.
Has This Been Done Before?
Researchers have spent decades trying to use statistics to model human language, but only recently have we been able to generate realistic human text.
Claude Shannon, the father of information theory, applied his theory to model the English language in a 1950 paper.
In 1954, IBM researchers demonstrated a machine that translated six hundred (simple, carefully-selected) Russian sentences into English. This primitive start sparked military interest in automatically translating enemy communications--turning machine translation and language modeling into another front of the Cold War. The research community later built more comprehensive linguistic on the mathematics of Andrey Markov.
Later, researchers developed a family of statistical models known as neural networks to solve a variety of different problems. Starting in 2001, researchers began modeling language with neural networks. Innovations in neural network design and ever-faster computers led to these initially-small networks growing to gigantic proportions. Leo Gao wrote about GPT-3 and scaling trends in neural networks.
How Big is GPT-3?
OpenAI's largest version of GPT-3's predecessor--GPT-2--had 1.5 billion parameters. GPT-2's ability to generate realistic English was so good that OpenAI hesitated for several months to publicly release the model. GPT-3 has 175 billion parameters. Currently the model is only available through OpenAI's servers, and only to users that have been invited.
These networks' larger size allows them to memorize increasingly complex patterns in the English datasets that the models learn from. The models don't necessarily have intelligence or insight, but simply have more bandwidth for imitation.
Is GPT-3 Overhyped?
It depends. Building ever-larger neural networks sometimes feels like throwing money at a math problem, but it also takes significant technical innovation to make the creation of such networks even possible.
Sam Altman--currently the CEO of OpenAI--recently remarked their creation GPT-3 may be overhyped.
Jerome Pesenti, currently the Vice President of AI at Facebook, identified GPT-3's tendency to generate socially-uncomfortable text when given certain handpicked inputs.
Further Reading
OpenAPI API (OpenAI's invite-only interface for GPT-3)
Language Models are Few-Shot Learners (the original GPT-3 paper)
OpenAI’s new language generator GPT-3 is shockingly good—and completely mindless
How GPT3 works. A visual thread.
Too big to deploy: How GPT-2 is breaking servers
A history of machine translation from the Cold War to deep learning
A Review of the Neural History of Natural Language Processing