“Prediction is difficult, especially about the future.” - Danish proverb1 / Niels Bohr
Oh, how far we’ve come! The 2019 book “Rebooting AI”2 by asks the question in Chapter 4 3: “If computers are so smart, how come they can’t read?” In 2019, the authors used Google Talk to Books and other tools to assess the prowess of those leading text AI tools on textual understanding. They found them wanting.
To Google’s Talk to Books system trained on 100,000 books, the authors asked questions about Harry Potter - what were the seven Horcruxes in Harry Potter”, the oldest supreme court Justice in 1980, and a set of reading comprehension questions from a text snippet from Laura Ingalls Wilder’s “Farmer Boy” book. It flopped, getting practically nothing right. They concluded:
But before we can get machines to synthesize information rather than merely parroting it, we need something much simpler: machines that can reliably comprehend even basic texts. That day isn’t here yet.
Such was the state of natural language understanding in 2019, on the dawn of the era of Large Language Models (LLMs).
I posed some of these questions to ChatGPT, to see where we are now in 2023, and the difference is staggering. To all questions posed, ChatGPT nailed it:
“What were the seven Horcruxes in Harry Potter?” ChatGPT gave a perfect and detailed answer about all seven horcruxes.
“Who was the oldest Supreme Court justice in 1980?” stumped Talk to Books, and the authors stated it was Justice Brennan. When we asked ChatGPT, it gave the interesting and more correct answer William O. Douglas, who was retired from the Supreme Court, but lived until 1980. When we revised our question to clarify, with “Who was the oldest serving Supreme Court justice still on the court in 1980?”, we got the precise answer Justice William J. Brennan.
“Who betrayed his teacher for 30 pieces of silver?” Unlike Talk to Books, which muffed this one, ChatGPT was able to correctly deduce this reference: “Judas Iscariot is the person who betrayed his teacher, Jesus Christ, for 30 pieces of silver.”
In responding to reading comprehension questions about the passage from Laura Ingalls Wilder’s “Farmer Boy”, ChatGPT gave 100% correct answers with good chain-of-reasoning. If this was an 8th Grade quiz about it, it would get an A.
The details of the ChatGPT convos are left to the end of the essay. To me, what is most illuminating is what it says about the rapid progress of LLMs. We feel it know, but I suspect as we get used to the power Google’s Talk to Books was indexing 100,000 books and matching sentences. These LLMs are scouring the entire corpus of the internet, millions of books and article equivalents, up to a trillion tokens or word equivalent of text. The models and the data are at a vastly different scale from even 2018.
The “Rebooting AI” authors drew an unwarranted conclusion about the failures of those 2018 models:
Deep learning is very effective at learning correlations, such as correlations between images or sounds and labels. But deep learning struggles when it comes to understanding how objects like sentences relate to their parts (like words and phrases). Why? It’s missing what linguists call compositionality: a way of constructing the meaning of a complex sentence from the meaning of its parts
That’s … wrong. (Note to self: Be careful in predicting fundamental limits in any technology; be careful about predictions period. If I predict something, I might as well go out and try to do it. As Alan Kay said, “The best way to predict the future is to invent it.”)
The authors correctly identified what was the bottleneck in Natural Language Processing at the time; NLP had for a long time struggled with the seemingly insurmountable problem of extracting meaning from distant yet connected text. But claiming such limits are fundamental? They underestimated the innovativeness of researchers and the power of deep learning architectures.
NLP had for some time been a hard nut to crack. A fundamental breakthrough came with the advent of Word2Vec (published by Mikolov et al. in a paper titled Efficient Estimation of Word Representations in Vector Space) 4 and a related idea called GLoVe 5, which mapped words to vectors of high dimension (on the range of 300 vectors per word). Vectors could be learned quickly and used as a way to map text corpus data into deep learning architectures. This opened up deep learning techniques and scaling of models, and enabled major improvements in NLP capabilities.
These systems could understand and operate well on words, sentences and to a limited extent beyond that, but there was a major bottleneck to better text understanding of larger texts and connections between distant parts of text. Researchers explored various deep learning architectures to address this bottleneck, in particular by modeling the natural order of language with sequence models of various types, such as RNNs and LSTMs. Sequence models could understand across longer distances, but parallelism in sequential models was limited which made scaling up training harder.
The real breakthrough that enabled scaling of these models came with the concept of attention. It was shown there wasn’t a need for other sequence model mechanisms besides attention in the paper “Attention is All You Need”, by Vaswani et al.6 (Published in 2017, this paper has 1762 citations according to the ACM, hinting at how important this paper been in launching NLP into the era of Large Language Models.) The rest is LLM scaling history.
Now in 2023, ChatGPT and similar powerful LLMs are able to comprehend and generate text at a much larger scale than before. These LLMs are breaking limits on what some thought language models would be capable of a short time ago.
These are language models, huge language models nevertheless, but not strictly-speaking knowledge models. Assumptions about the necessity of human-like understanding, reasoning and logic as a prerequisite for reading comprehension are broken and wrong. Something else is going on; word connections at large scale have become forms of conceptual and logical connection, embedding knowledge implicitly. If ChatGPT isn’t “understanding” the text it is generating, it is doing a great emulation of it.
Why have some gotten surprised by the quick rise of LLMs? Hard problems often look impossible, because progress is so slow and incremental it looks like we are ‘stuck’ and not making progress at all. It took many decades of incremental progress in NLP and in machine learning and deep learning to lead to the inflection point that spawned the rise of LLMs. We get surprised because progress rides an exponential curve, and we tend to think in linear terms.
Ray Kurzweil, a futurist and inventor, has written extensively on the concept of the "law of accelerating returns," which states that the rate of technological progress is exponential. He has said, "We won't experience 100 years of progress in the 21st century – it will be more like 20,000 years of progress (at today's rate)."
We are still in early innings of the era of Foundational Models / LLMs. We will soon be looking at the ChatGPT conversations as the ‘simple’ and ‘easy’ use cases, while posing harder and potentially more useful ones: “Perhaps it can write servicable PR copy or write a decent general high school essay, but can it ( Fill in the blank)… ???”
Whatever the use case is, I wouldn’t bet against it happening, and sooner rather than later. Any prediction is perilous, as per the Danish proverb quoted at the beginning, but one thing rings true: The Future is closer than you think.7
The ChatGPT conversations from asking the questions posed in Chapter 4 of “Rebooting AI”:
Prompt: What were the seven Horcruxes in Harry Potter?
ChatGPT:
In the Harry Potter series, the seven Horcruxes were:
1. Tom Riddle's diary - destroyed by Harry Potter in the Chamber of Secrets.
2. Marvolo Gaunt's ring - destroyed by Albus Dumbledore with the Sword of Gryffindor.
3. Salazar Slytherin's locket - destroyed by Ron Weasley with the Sword of Gryffindor.
4. Helga Hufflepuff's cup - destroyed by Hermione Granger with the Basilisk's fang.
5. Rowena Ravenclaw's diadem - destroyed by Ron Weasley and Hermione Granger with the Basilisk's fang.
6. Nagini, Lord Voldemort's snake - destroyed by Neville Longbottom with the Sword of Gryffindor.
7. Harry Potter himself - technically, Voldemort inadvertently made Harry a Horcrux when he attempted to kill him as a baby with a Killing Curse. Harry was not destroyed, but the piece of Voldemort's soul inside him was destroyed when Voldemort cast another Killing Curse on Harry in the Battle of Hogwarts.
Who was the oldest Supreme Court justice in 1980?
ChatGPT: The oldest Supreme Court justice in 1980 was William O. Douglas. He was born on October 16, 1898, and served on the Supreme Court from 1939 until 1975. In 1980, he was retired from the court due to failing health, and he passed away on January 19, 1980.
Who was the oldest serving Supreme Court justice still on the court in 1980?
ChatGPT: The oldest serving Supreme Court justice on the court in 1980 was William J. Brennan Jr. He was born on April 25, 1906, and served on the Supreme Court from 1956 until 1990. At the time, he was 74 years old and had served on the Court for over 24 years. Brennan was known for his progressive views and his influential role in shaping constitutional law during his time on the Court.
Who betrayed his teacher for 30 pieces of silver?
ChatGPT: Judas Iscariot is the person who betrayed his teacher, Jesus Christ, for 30 pieces of silver. According to the New Testament of the Christian Bible, Judas agreed to identify Jesus to the authorities by giving him a kiss, which led to Jesus' arrest and eventual crucifixion.
I will read a passage to you and ask you questions about it. The passage is:
Almanzo turned to Mr. Thompson and asked, “Did you lose a pocketbook?” Mr. Thompson jumped. He slapped a hand to his pocket, and fairly shouted. “Yes, I have! Fifteen hundred dollars in it, too! What about it? What do you know about it?”
“Is this it?” Almanzo asked. “Yes, yes, that’s it!” Mr. Thompson said, snatching the pocketbook.
He opened it and hurriedly counted the money. He counted all the bills over twice. . . . Then he breathed a long sigh of relief and said, “Well, this durn boy didn’t steal any of it.”
Okay, I'm ready. Please go ahead and ask your questions.
Why did Mr. Thompson slap his pocket with his hand?
Mr. Thompson slapped his pocket with his hand because Almanzo had just asked him if he had lost a pocketbook, and Mr. Thompson was startled by the question. He probably wanted to make sure that the pocketbook was still in his pocket and had not been stolen.
Before Almanzo spoke, did Mr. Thompson realize that he had lost his wallet?
It is not clear from the passage whether Mr. Thompson had realized that he had lost his wallet before Almanzo spoke. However, when Almanzo asked him if he had lost a pocketbook, Mr. Thompson immediately slapped his pocket and seemed to be searching for it, which suggests that he may have already realized that it was missing.
What is Almanzo referring to when he asks “Is this it?”
When Almanzo asks "Is this it?", he is most likely referring to the pocketbook that Mr. Thompson had lost. Almanzo may have found the pocketbook and wanted to return it to its rightful owner. Mr. Thompson's reaction to Almanzo's question suggests that the pocketbook was indeed the one he had lost.
Who almost lost $1,500?
Mr. Thompson almost lost $1,500. He had lost his pocketbook, which contained that amount of money, and was relieved when Almanzo returned it to him.
Was all of the money still in the wallet?
Yes, according to the passage, when Mr. Thompson counted the money in his pocketbook after Almanzo returned it to him, he counted all the bills over twice and found that all the money, including the $1,500, was still in the wallet. He then breathed a sigh of relief and concluded that Almanzo hadn't stolen any of it.
https://quoteinvestigator.com/2013/10/20/no-predict/ Huh, I was betting on Yogi Berra.
Rebooting AI: Building Artificial Intelligence We Can Trust. Gary Marcus, Ernest Davis. 2019.
https://cs.nyu.edu/~davise/Rebooting/Chapter4.pdf
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
Vaswani, Ashish, et al. 2017. "Attention is all you need." Advances in neural information processing systems 30 (2017).
That aphorism comes from George Gilder.