It has been a mere two weeks since GPT-4 was announced and Microsoft announced that Bing chat had been running off of GPT-4 as well. The Technical Report shared with OpenAI’s GPT-4 announcement gave us a glimpse of how improved and powerful GPT-4 was, with vastly improved test scores (such as LSAT, AP, SAT), benchmarks, and capabilities over prior GPT-3.5 and other LLMs. The ground-breaking chatGPT that took the world by storm a mere 4 months ago is already the previous generation AI.
Then last week, Microsoft researchers dropped a bombshell paper “Sparks of Artificial General Intelligence: Early experiments with GPT-4” that shared their experience running and testing GPT-4 for the past several months. We mentioned the paper in our March 25 AI Week in Review, but the astounding results and the other GPT-4 capabilities that creators are making and use cases deserve a deeper dive on specifics.
GPT-4 Can Write Code
GPT-4 is human-level capable at software programming. GPT-4 did as well as humans on LeetCode coding challenges and is also able to generate end-to-end programming results.
Just as GPT-4 is human multi-lingual, it is computer code multi-lingual, able to output a range of programming languages and in a range of domains.
It can reverse-engineer code, reason about code, execute pseudo-code and python code and more.
Standard GPT-4 response is just a baseline though. Putting GPT-4 into the loop to help improve the developer’s code and its own can yield even a higher level of performance. This ‘copilot’ human+AI mode super-charges the productivity of software development and is better than each on its own.
Users can create websites, video games, iPhone apps from scratch with zero programming skills. Using GPT-4 and other AI tools such as Replit AI code assistant, a non-programmer user was able to create a Javascript-based video game. Another user went one better and made GPT-4 create a website to code any arcade game you can think of and play instantly. Joust anyone?
GPT-4 Can Be a Tutor
GPT-4 has aced the SAT, LSAT, GRE, AP tests, and a whole lot more. Oh, and it can converse in many languages. Being College-level capable at a range of subjects just made GPT-4 as informative and helpful as any internet tutor. It disrupts the entire tutoring ecosystem.
“GPT-4 Can Tutor You Like Socrates OpenAI says users can now prescribe their AI’s style and task by giving it directions via a system message. Instead of the classic ChatGPT’s fixed personality, API users can customize user experiences to make it talk like a pirate or tutor in the Socratic method.”
Already Khan Academy and DuoLingo are leveraging GPT-4, and being so multi-lingual, GPT-4 replaces any
GPT-4 Can Do Math
Caution from the authors that it’s far from perfect: “GPT-4 can answer difficult (indeed, competitive) high-school level math questions, and can sometimes engage in meaningful conversation around advanced math topics. Yet, it can also make very basic mistakes and occasionally produce incoherent output which may be interpreted as a lack of true understanding.”
I myself have seen problems in bing chat (incorrect log calculations and some long multiplication errors), but on the kinds of reasoning questions that chatGPT fell flat on, a big improvement.
The Microsoft testers probed GPT-4 for critical reasoning and creative reasoning and found its can reason abstractly across a number of math and related fields.
Where it is imperfect, GPT-4 can make up for it by connecting itself to sources like Wolfram Alpha.
GPT-4 Can Use Tools And Be a Helpful Practical Assistant
“GPT-4 is capable of both identifying and using external tools on its own in order to improve its performance. It is able to reason about which tools it needs, effectively parse the output of these tools and respond appropriately (i.e., interact with them appropriately), all without any specialized training or fine-tuning.”
GPT-4 can direct and leverage other tools to do tasks, e.g., browsers, apps, email, calendar, programming languages etc. The testers showed that GPT-4 was able to perform tasks like coordinating and booking a dinner: “GPT-4 uses the available APIs to retrieve information about the user’s calendar, coordinate with other people over email, book the dinner, and message the user with the details”
GPT-4 can reason through to solve real-world problems like helping figure out the cause of a plumbing problem interacting with a human.
This is why the chatGPT ‘plug-in’ announcement is so important. The power of GPT-4 is restricted to advice and written interaction if it’s just a standalone AI tool. But once it’s connected with other APIs and tools, it can do things that make it a powerful helper and ‘gopher’.
GPT-4 Can Understand and Work Visually
GPT-4 has visual input, making it a multi-modal model. It can classify and caption images, analyzing image text and understand meaning, even humor, of images.
GPT-4’s visual capability enables it to convert visual cues into all manner of other artifacts, such as going from napkin sketch to website in a single shot - demo here.
GPT-4’s visual understanding gives it a remarkable ability to output meaningful visual representations. “When prompting the model to generate images of objects such as a cat, a truck or a letter in the alphabet using Scalable Vector Graphics (SVG), the model produces code which usually compiles to rather detailed and identifiable images (Figure 2.4)”
While the outputs aren’t pretty, connecting GPT-4 to a generative AI to create real visual representations will amplify its power. The researchers showed a basic version of this, but others are taking it much further, hooking it up to 3D renderers like Blender.
GPT-4 Unlocks Writing Productivity
The capability closest to home for an LLM is assisting with writing, and GPT-4 sets a new standard in it beyond any prior AI. I used AI writing assistant crear.ai and Bing chat to summarize what GPT-4 can do in this space, and they told us:
Professional Writing: GPT-4 can generate well-structured and coherent content for emails, documents, and presentations, saving you time and effort.
Content Marketing: GPT-4 can revolutionize your content marketing by generating engaging and informative content for your target audience.
Language Translation: GPT-4’s multilingual capabilities make it a powerful tool for translating content across languages without compromising on quality.
Chatbot Development: GPT-4 can be used to develop chatbots that can communicate effectively with users in natural language.
Existing writing assistant tools already to do these things to some extent, but GPT-4 produces better, more accurate and more coherent writing versus predecessors, obsoleting prior generations of AI and tools. Instead of merely helping and requiring extensive rewrite, GPT-4 can quickly produce on-target writing on demand. As with software, the best results come from combining human and AI; writers using AI as a co-pilot will achieve great results while boosting productivity many-fold, perhaps by 5 or 10 times.
GPT-4 + Data = Amazing Usefulness
We will see a common design pattern that leverages GPT-4 and making it even more useful. This is to hook up GPT-4 to a database or knowledge base to help target the GPT-4 output result. I will explain more about this Foundation Model design pattern in future articles, but for now here are two examples.
James Briggs shows how to get GPT-4 to produce much better results simply by bringing in data.
You can also power GPT-4 with a database on the side to have conversations about a document.
This and other creative uses of GPT-4 are opening our eyes to what GPT-4 can do when embedded with other capabilities. More like these two examples are cropping up daily, because it is a powerful way to magnify the usefulness of GPT-4.
GPT-4 Gets Closer to AGI - Awesome Genius Innovations
GPT-4 is more Aligned - One way OpenAI improved GPT-4 over GPT-3 was with human rating feedback, using the technique of RLHF (Reinforcement Learning with Human feedback). The LLM’s level of “alignment”, which is the AI giving a result that responds accurately to the request, gets greatly improved with human feedback. The result is fewer hallucinations, more precise and reasonable answers, and a generally more useful AI.
GPT-4 is more Robust - GPT-4 is able to handle many tasks much better than GPT-3 and chatGPT with less finicky prompting and better zero-shot performance. In the AI safety realm, OpenAI was able to reduce GPT-4’s “refusals” while also reducing toxic or incorrect responses.
Time and again across a number of capabilities, the early GPT-4 testers noticed that “GPT-4's performance is strikingly close to human-level performance.” Their key takeaway is that while GPT-4 is not AGI, it shows how it’s not far away:
Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
AI is a tool and AI models should be judged on their utility. Its not about getting to AGI (Artificial General Intelligence) but the level of utility this Foundation Model can enable. The next stage will be the explosion of tools, plug-ins and apps built on top of GPT-4 and similar AI. Judging by the creative things happening already, we are experiencing a different kind of AGI - Awesome Genius Innovations.