TL;DR
Have we passed the Turing Test? Not yet, but GPT-4 gets close
AGI is when an AI can pass the Heinlein Test
AGI is about embodied autonomy, not answering questions
I am a fan of Yann LeCun and his embodied Turing Test
I’m sticking with my prediction. AGI by 2029.
GPT-4 Takes the Turing Test
"I propose to consider the question, 'Can machines think?'"
— Alan Turing, 1950
Alan Turing proposed a famous test to determine if something was human-level Artificial Intelligence: If the computer could fool a human into thinking it was human, then it was truly AI. Turing’s formulation of this test was a game, the Imitation Game, where AIs and humans would pretend to be human, be queried through a text-interface. The Turing Test has been a North Star for AI researchers pondering what makes ‘human-like’ AI.
Researchers from US San Diego considered chat-based LLMs as the perfect models for this kind of test. So they set up a real Turing’s Imitation Game using various GPT-4 AI model to ask the question: Does GPT-4 Pass the Turing Test?
Their findings from their evaluation was published in a recent paper. They showed that the “best-performing GPT-4 prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and GPT-3.5 (14%), but falling short of chance and the baseline set by human participants (63%).” Since the Turing criteria would be to pass 70% of the time, well above chance, we would have to say that, no, GPT-4 has not passed the Turing test.
You too can play the imitation game, since the researchers’ game is online, and try to guess AI versus human. I just played it. I was the interrogator. The responses I got were all were chat-like short answers with no punctuation, as if from a teenager. It was very much like a teenager and not a typical AI bot, but they were too much on that vein, and the excessive style-consistency was a tipoff to me. So I guessed correctly it was an AI.
Since I was a tester, I was looking for differences. Like examining an AI-generated image, if you look closely, you can detect subtle differences. However, if I wasn’t looking for it, I could easily be fooled.
The Turing Test may be less about AI intelligence, and more about how competent AI is in deception, i.e., fooling people on social media. This is implicated in many AI risk scenarios, such as AI-based versions of Nigerian prince email scams. This point is well-taken:
Despite known limitations as a test of intelligence, we argue that the Turing Test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences …
If GAN-like algorithm can help evolve LLM behaviors to continually adapt and avoid “AI-like behavior” then perhaps they can improve enough to beat the Turing Test.
The Heinlein Test
The Turing Test is of limited value in setting a baseline of what useful AI would be, or what AGI would be. We must look elsewhere for guidance. OpenAI’s Sam Altman speaks of performing tasks at a level that a “median human” would do. But what tasks?
I came across a beautiful quote from Robert Heinlein about what it means to be human, that in my view also captures what it might mean for AI to achieve human-level AGI:
If an AI can do all that, it has reached human-level AGI and perhaps achieved something more profound.
That last line - “Specialization is for insects” - says it all. What has made humans the most successful species on the planet is not any one skill. There are animals that are stronger, faster, more agile, can see better, etc. We are unique in that we are superb generalists; we succeeded by being adaptable learners that use tools to extend ourselves. And AI is our latest mind-leveraging tool.
None of the skills and tasks Heinlein mentions are innate. Let’s break it down, because I believe it captures the essence of what AGI really means:
Change a diaper, cook a tasty meal: Do complex manual household tasks.
Solve equations, balance accounts, program a computer: Perform complex math and reasoning.
Plan an invasion, analyze a new problem, give orders: Conduct planning and analysis tasks and set plans in motion.
Butcher a hog, build a wall, pitch manure, set a bone: Perform physical labor, skilled and unskilled, to meet real needs.
Take orders, cooperate: Work within a bigger system (or group) to solve problems and get things done. This is alignment, but it
Act alone: This is acting with autonomy. You are not the co-pilot, you are the captain.
Write a sonnet: Generate art and literature. Nothing distinguishes humans more from the animals than our ability to create and communicate ideas.
Comfort the dying: Exhibit empathy. Without emotions, we are not human. Whether AI needs emotions itself is an open question, but AI will need emotional intelligence to be useful in tasks of comforting, mentoring, therapy, and counseling.
Few humans can do everything on that list. We call such learned generalists “Renaissance men,” an admission that they are an anachronism in our complex modern world of specialists. Still, it’s a bar to set of our potential, as humans have an amazing capacity to learn all these skills.
It’s a fine bar to set for AGI as well. It makes the concept of AGI concrete. AGI - Artificial General Intelligence - is not about any one skill, but it’s about EVERY skill. It’s broad capability across mental, physical and emotion skills to accomplish meaningful tasks. It’s also about a level of skill - the highest being full autonomy.
This, then, is the Heinlein Test: Human-level AGI is when an embodied AI can do most of these above tasks, or tasks of a similar nature, at a competent human level.
Autonomous AGI is doing the above tasks at a competent autonomous performance level.
What do you think? Is this a good test? Leave a comment.
AGI Will Be Embodied
I’ve made the point before, but the list makes it obvious: AGI will have to be embodied, i.e., physical. The AGI-level AI will need to be an embodied AI robot capable of complex physical manipulations, such as cooking an omelette. It will need real-world experience and capability to be AGI.
Scaling language calculators are not all you need. Language models are very adaptable and more general than any AI that came before. But language itself is just one dimension of human skill and experience, thus LLMs can only deliver a subset of human-level capabilities and understanding.
All the same, we should not under-estimate the importance of LLMs, which alone are sufficient to disrupt, displace and automate a huge amount of intellectual effort and business workflows. LLMs, if they can drive code generation that translates into directing robotic movements, could also end up “in the driver’s seat” of robotic control.
Autonomy is the Apex
AGI is about autonomy, because it encompasses the skill of solving problems from a high level, planning the solution, and executing on the mental and physical tasks involved.
For self-driving cars, autonomy is the goal because the main value-add is in full automation, not partial. However, full automation is an unforgiving goal: An 80% good ‘co-pilot’ AI is useful and productive, but full automation requires a 99.999% reliable solution or you risk catastrophic failures.
Ilya Sutskever has mentioned reliability as the biggest stumbling block to AI progress. In the end, it may not be AI model size or capabilities that dictate when we get to AGI, but reliability as the gating factor for fail-safe autonomous AGI.
Yann’s Embodied Turing Test
While none of this above answers the question “When AGI?” directly, it sheds light on the question of what AGI needs to be.
On this point, Yann LeCun has had interesting takes. In an interview with Craig Smith of Eye on AI, the Turing Award winner and head of Meta AI Research expressed his view that not only are we not at human-level AI, but we aren’t even at cat-level AI, noting that AI cannot even do some tricks that cats can do.
Yann LeCun takes a fairly dim view of LLMs as the path to AGI. He is putting his research where his mouth is, as developer of the I-JEPA model, a different approach to building a world model than LLMs takes.
In this interview, Yann speaks of an ‘embodied Turing Test’ - can you make a robot that can acquire behaviors that are indistinguishable from animals or humans? The tasks that he speaks of are similar to what I am describing with the Heinlein Test.
“The world is much more complicated than language.” - Yann Lecun
When AGI?
At least we now have a definition of What AGI, but what of the headline question, when AGI? Some recent predictions made in interviews on the question:
Shane Legg on Dwarkesh Patel. He predicted over a decade ago that we’d have AGI around 2025 to 2028. He’s feel comfortable with predicting a 50% change we will reach AGI by 2028.
OpenAI’s Sam Altman on Joe Rogan, when asked about the pace of progress to AGI, spoke of a conversation at OpenAI in the early days, where they surmised “This is about a 15 year project.” He said that prediction still feels right, “2030 - 2031, a reasonable estimate with huge error bars.”
Six months ago, I predicted AGI by 2029. I estimated that we will need three generations to get to AGI, and that we will reach AGI with a GPT-7 level embodied AI capability. With all the news since then, we have accelerated AI timelines in many areas. GPT-5 will be a very powerful multi-modal AI and agents built on top of that will displace many human tasks. They will feel ‘close’ to AGI.
But while capabilities keep expanding, AI reliability is still an challenge that is moving more slowly. I’m sticking with my prediction. AGI by 2029.
Thank you. Very interesting. I keep distinguishing between AGI and ASI. ASI, imho, will find its way into a body. AGI will probably find its way into our walls (ambient computing). A very advanced ANI can fool us in a classic Turing test, and this will happen before 2030. But AGI, to be AGI, has to be self-aware. I think that will take a bit longer, and there will be a lot of ethical and philosophical debate. Gen Alpha, the first generation to be fully aware of not being able to tell the difference between what's real and what's not, will eventually be the generation that cohabits the Earth (or Mars) with embodied super intelligence. My 2 cents.