OpenAI’s GPT-2 text-generating algorithm was once considered too dangerous to release. Then it got released — and the world kept on turning.
In retrospect, the comparatively small GPT-2 language model (a puny 1.5 billion parameters) looks paltry next to its sequel, GPT-3, which boasts a massive 175 billion parameters, was trained on 45 TB of text data, and cost a reported $12 million (at least) to build.
“Our perspective, and our take back then, was to have a staged release, which was like, initially, you release the smaller model and you wait and see what happens,” Sandhini Agarwal, an A.I. policy researcher for OpenAI told Digital Trends. “If things look good, then you release the next size of model. The reason we took that approach is because this is, honestly, [not just uncharted waters for us, but it’s also] uncharted waters for the entire world.”
Jump forward to the present day, nine months after GPT-3’s release last summer, and it’s powering upward of 300 applications while generating a massive 4.5 billion words per day. Seeded with only the first few sentences of a document, it’s able to generate seemingly endless more text in the same style — even including fictitious quotes.
Is it going to destroy the world? Based on past history, almost certainly not. But it is making some game-changing applications of A.I. possible, all while posing some very profound questions along the way.
Recently, Francis Jervis, the founder of a startup called Augrented, used GPT-3 to help people struggling with their rent to write letters negotiating rent discounts. “I’d describe the use case here as ‘style transfer,'” Jervis told Digital Trends. “[It takes in] bullet points, which don’t even have to be in perfect English, and [outputs] two to three sentences in formal language.”
Powered by this ultra-powerful language model, Jervis’s tool allows renters to describe their situation and the reason they need a discounted settlement. “Just enter a couple of words about why you lost income, and in a few seconds you’ll get a suggested persuasive, formal paragraph to add to your letter,” the company claims.
This is just the tip of the iceberg. When Aditya Joshi, a machine learning scientist and former Amazon Web Services engineer, first came across GPT-3, he was so blown away by what he saw that he set up a website, www.gpt3examples.com, to keep track of the best ones.
“Shortly after OpenAI announced their API, developers started tweeting impressive demos of applications built using GPT-3,” he told Digital Trends. “They were astonishingly good. I built [my website] to make it easy for the community to find these examples and discover creative ways of using GPT-3 to solve problems in their own domain.”
Fully interactive synthetic personas with GPT-3 and https://t.co/ZPdnEqR0Hn 🎇
They know who they are, where they worked, who their boss is, and so much more. This is not your father's bot… pic.twitter.com/kt4AtgYHZL
— Tyler Lastovich (@tylerlastovich) August 18, 2020
“I did not expect a single language model to perform so well on such a diverse range of tasks, from language translation and generation to text summarization and entity extraction,” Joshi said. “In one of my own experiments, I used GPT-3 to predict chemical combustion reactions, and it did so surprisingly well.”
The transformative uses of GPT-3 don’t end there, either. Computer scientist Tyler Lastovich has used GPT-3 to create fake people, including backstory, who can then be interacted with via text. Meanwhile, Andrew Mayne has shown that GPT-3 can be used to turn movie titles into emojis. Nick Walton, chief technology officer of Latitude, the studio behind GPT-generated text adventure game AI Dungeon recently did the same to see if it could turn longer strings of text description into emoji. And Copy.ai, a startup that builds copywriting tools with GPT-3, is tapping the model for all it’s worth, with a monthly recurring revenue of $67,000 as of March — and a recent $2.9 million funding round.
Machine learning has been a game-changer in all sorts of ways over the past couple of decades.
“Definitely, there was surprise and a lot of awe in terms of the creativity people have used GPT-3 for,” Sandhini Agarwal, an A.I. policy researcher for OpenAI told Digital Trends. “So many use cases are just so creative, and in domains that even I had not foreseen, it would have much knowledge about. That’s interesting to see. But that being said, GPT-3 — and this whole direction of research that OpenAI pursued — was very much with the hope that this would give us an A.I. model that was more general-purpose. The whole point of a general-purpose A.I. model is [that it would be] one model that could like do all these different A.I. tasks.”
Many of the projects highlight one of the big value-adds of GPT-3: The lack of training it requires. Machine learning has been transformative in all sorts of ways over the past couple of decades. But machine learning requires a large number of training examples to be able to output correct answers. GPT-3, on the other hand, has a “few shot ability” that allows it to be taught to do something with only a small handful of examples.
GPT-3 is highly impressive. But it poses challenges too. Some of these relate to cost: For high-volume services like chatbots, which could benefit from GPT-3’s magic, the tool might be too pricey to use. (A single message could cost 6 cents which, while not exactly bank-breaking, certainly adds up.)
Others relate to its widespread availability, meaning that it’s likely going to be tough to build a startup exclusively around since fierce competition will likely drive down margins.
Another is the lack of memory; its context window runs a little under 2,000 words at a time before, like Guy Pierce’s character in the movie Memento, its memory is reset. “This significantly limits the length of text it can generate, roughly to a short paragraph per request,” Lastovich said. “Practically speaking, this means that it is unable to generate long documents while still remembering what happened at the beginning.”
Perhaps the most notable challenge, however, also relates to its biggest strength: Its confabulation abilities. Confabulation is a term frequently used by doctors to describe the way in which some people with memory issues are able to fabricate information that appears initially convincing, but which doesn’t necessarily stand up to scrutiny upon closer inspection. GPT-3’s ability to confabulate is, depending upon the context, a strength and a weakness. For creative projects, it can be great, allowing it to riff on themes without concern for anything as mundane as truth. For other projects, it can be trickier.
Francis Jervis of Augrented refers to GPT-3’s ability to “generate plausible bullshit.” Nick Walton of AI Dungeon said: “GPT-3 is very good at writing creative text that seems like it could have been written by a human … One of its weaknesses, though, is that it can often write like it’s very confident — even if it has no idea what the answer to a question is.”
In this regard, GPT-3 returns us to the familiar ground of John Searle’s Chinese Room. In 1980, Searle, a philosopher, published one of the best-known A.I. thought experiments, focused on the topic of “understanding.” The Chinese Room asks us to imagine a person locked in a room with a mass of writing in a language that they do not understand. All they recognize are abstract symbols. The room also contains a set of rules that show how one set of symbols corresponds with another. Given a series of questions to answer, the room’s occupant must match question symbols with answer symbols. After repeating this task many times, they become adept at performing it — even though they have no clue what either set of symbols means, merely that one corresponds to the other.
GPT-3 is a world away from the kinds of linguistic A.I. that existed at the time Searle was writing. However, the question of understanding is as thorny as ever.
“This is a very controversial domain of questioning, as I’m sure you’re aware, because there’s so many differing opinions on whether, in general, language models … would ever have [true] understanding,” said OpenAI’s Sandhini Agarwal. “If you ask me about GPT-3 right now, it performs very well sometimes, but not very well at other times. There is this randomness in a way about how meaningful the output might seem to you. Sometimes you might be wowed by the output, and sometimes the output will just be nonsensical. Given that, right now in my opinion … GPT-3 doesn’t appear to have understanding.”
An added twist on the Chinese Room experiment today is that GPT-3 is not programmed at every step by a small team of researchers. It’s a massive model that’s been trained on an enormous dataset consisting of, well, the internet. This means that it can pick up inferences and biases that might be encoded into text found online. You’ve heard the expression that you’re an average of the five people you surround yourself with? Well, GPT-3 was trained on almost unfathomable amounts of text data from multiple sources, including books, Wikipedia, and other articles. From this, it learns to predict the next word in any sequence by scouring its training data to see word combinations used before. This can have unintended consequences.
This challenge with large language models was first highlighted in a groundbreaking paper on the subject of so-called stochastic parrots. A stochastic parrot — a term coined by the authors, who included among their ranks the former co-lead of Google’s ethical A.I. team, Timnit Gebru — refers to a large language model that “haphazardly [stitches] together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning.”
“Having been trained on a big portion of the internet, it’s important to acknowledge that it will carry some of its biases,” Albert Gozzi, another GPT-3 user, told Digital Trends. “I know the OpenAI team is working hard on mitigating this in a few different ways, but I’d expect this to be an issue for [some] time to come.”
OpenAI’s countermeasures to defend against bias include a toxicity filter, which filters out certain language or topics. OpenAI is also working on ways to integrate human feedback in order to be able to specify which areas not to stray into. In addition, the team controls access to the tool so that certain negative uses of the tool will not be granted access.
“Bias and the potential for explicit returns absolutely exist and require effort from developers to avoid.”
“One of the reasons perhaps you haven’t seen like too many of these malicious users is because we do have an intensive review process internally,” Agarwal said. “The way we work is that every time you want to use GPT-3 in a product that would actually be deployed, you have to go through a process where a team — like, a team of humans — actually reviews how you want to use it. … Then, based on making sure that it is not something malicious, you will be granted access.”
Some of this is challenging, however — not least because bias isn’t always a clear-cut case of using certain words. Jervis notes that, at times, his GPT-3 rent messages can “tend towards stereotypical gender [or] class assumptions.” Left unattended, it might assume the subject’s gender identity on a rent letter, based on their family role or job. This may not be the most grievous example of A.I. bias, but it highlights what happens when large amounts of data are ingested and then probabilistically reassembled in a language model.
“Bias and the potential for explicit returns absolutely exist and require effort from developers to avoid,” Tyler Lastovich said. “OpenAI does flag potentially toxic results, but ultimately it does add a liability customers have to think hard about before putting the model into production. A specifically difficult edge case to develop around is the model’s propensity to lie — as it has no concept of true or false information.”
Nine months after its debut, GPT-3 is certainly living up to its billing as a game changer. What once was purely potential has shown itself to be potential realized. The number of intriguing use cases for GPT-3 highlights how a text-generating A.I. is a whole lot more versatile than that description might suggest.
Not that it’s the new kid on the block these days. Earlier this year, GPT-3 was overtaken as the biggest language model. Google Brain debuted a new language model with some 1.6 trillion parameters, making it nine times the size of OpenAI’s offering. Nor is this likely to be the end of the road for language models. These are extremely powerful tools — with the potential to be transformative to society, potentially for better and for worse.
Challenges certainly exist with these technologies, and they’re ones that companies like OpenAI, independent researchers, and others, must continue to address. But taken as a whole, it’s hard to argue that language models are not turning to be one of the most interesting and important frontiers of artificial intelligence research.
Who would’ve thought text generators could be so profoundly important? Welcome to the future of artificial intelligence.
- Finishing touch: How scientists are giving robots humanlike tactile senses
- Analog A.I.? It sounds crazy, but it might be the future
- The funny formula: Why machine-generated humor is the holy grail of A.I.
- Here’s what a trend-analyzing A.I. thinks will be the next big thing in tech
- Read the eerily beautiful ‘synthetic scripture’ of an A.I. that thinks it’s God