OpenAI Goes Turbo & Takes Steps to Agents
OpenAI intros GPT-4 Turbo w 128k context, custom GPTs, Assistants API, Whisper V3, better TTS, plus lower pricing.
OpenAI DevDay Keynote
The OpenAI DevDay keynote was given by OpenAI CEO Sam Altman. After noting OpenAI’s progress in AI in the past year - ChatGPT, GPT4, voice integration, GPT-4 Vision, and enterprise models - he dropped a number of big updates. Along the way, we got a walk-on with Microsoft CEO Satya, a demo from Zapier, and an API token
Here’s the TL;DR on the new features and capabilities OpenAI released:
Their new version of GPT-4, GPT-4 turbo, now powers ChatGPT: 128K context, more control, better function calling, updated knowledge (to April 2023), all modalities (image in and out) supported), higher rate limits, and lower cost.
They lowered prices on API tokens across the board, with a 3X reduction on input tokens and 2X reduction on output tokens for GPT-4-turbo versus GPT-4.
They announced GPTs, customized versions of ChatGPT that users can create and share. Define GPTs with natural language, give it external resources (documents) to teach it, and share it through a new GPT platform store. Plugins are now "custom actions" for GPTs.
The Assistant API is OpenAI’s first steps towards Agents. It is a stateful memory-handling API with python Interpreter, document uploading, and function calling, all integrated as one streaming output to an API call.
Whisper voice synthesis is upgraded to Whisper V3, a better voice synthesis that powers ChatGPT and can power other audio interfaces via an API.
We will dive in further on the key releases and what it means.
OpenAI’s new leading AI Model - GPT-4 Turbo
GPT-4 launched in March as the most advanced LLM ever, and Sam Altman boasted “GPT-4 is still the most capable model out... in the world.” With Google, Anthropic and others angling to catch up, OpenAI is not resting on its laurels.
The feature upgrades to GPT-4 are significant enough to warrant a new name: GPT-4 Turbo. There are improvements across the board:
Context length is increased to128K (4x increase from 32K). Not just longer, it is also “much more accurate over a long context,” an issue plaguing long context models.
Better controls for developers: Models have a new JSON mode (via API parameter
response_format)
, which ensures the model will respond with valid JSON, and Reproducible outputs, for more reliable testing and validation.GPT-4 turbo is better at function calling, where a model intelligently outputs function call arguments to call external functions within an AI application. Today’s release improves the model to call multiple functions in a single message, as well as improving function calling accuracy.
Better world knowledge: They updated the GPT-4 turbo knowledge cutoff to April 2023, and promised to keep their AI models more current than they had been doing.
New modalities: GPT-4 Turbo with Vision, Dall-E 3, and Text To Speech are live in the API today, and it can accept images as inputs. The TTS has 6 preset natural voices, and their speech recognition model Whisper V3 is out, with improved performance and stronger across more languages.
Customization: Fine Tuning is now available for gpt-3.5-turbo-16k, and GPT-4 fine-tuning experimental access program is open for applications. Also, fully custom models are now a managed service product OpenAI offers to enterprise customers.
Pricing is lowered: GPT-4 Turbo pricing is 3X cheaper (1c/1k tokens) on inputs and 2X cheaper (3c/1k tokens) on outputs than prior GPT-4. A full GPT-4-128k prompt (reading an entire book) costs under $3. There are also higher usage limits.
Other AI model APIs saw similar price reductions. For example, GPT 3.5 16k is also 3x and 2x cheaper.
Finally, the ChatGPT model picker is gone. GPT-4 turbo and all its power is available in the main ChatGPT interface.
GPTs - Tailoring ChatGPT
OpenAI is introducing GPTs, best described as customized versions of ChatGPT for a specific purpose, built in an Agentic Framework, and publishable for shared use by others.
Altman emphasized that "Gradual iterative release is the best way to get to AGI safely," so GPTs will evolve in capabilities over time. For now, GPTs (terribly named) have these capabilities:
You program a GPT using natural language, using the GPT builder interface.
Can pick topic, it will suggest name, logo, topics, can put in content such as information for it to draw from.
You can publish a GPT to yourself only, or publish for others to use.
There will be a GPT store, like an app store, to share GPTs to others. OpenAI will offer revenue sharing: We will pay people for GPT - revenue sharing.
Plug-ins are subsumed in GPTs. “We evolved our plug-ins to become custom actions for GPTs.”
Examples of GPTs were given: A code.org GPT for tutoring purposes for middle-schoolers; a Canva-based GPT to use natural language to build canva images. They gave a live demo of the Zapier GPT as well, connecting a GPT to email, messaging and calendar APIs and taking actions.
Assistants API
OpenAI also launched the Assistants API to enable developers to build agent-like applications. The Assistants API uses the same capabilities that enable GPTs: Code interpreter, Retrieval, and function calling, and threads.
Code interpreter is a sandbox to create and execute python code. It is now available in the API, so can be embedded in other applications.
Retrieval gives the Assistants API added knowledge and context. This feature gives the API the ability to take care of the issues with retrieval-augmented generation, so that chunking, embeddings, etc. are taken care of.
The Assistants API knows about function calling, so it can invoke external APIs with just a few lines of code.
The Assistants API is a stateful API, with threads and messages as primitives. This is a serious step towards a real agent, as it carries over state from individual LLM calls:
A key change introduced by this API is persistent and infinitely long threads, which allow developers to hand off thread state management to OpenAI and work around context window constraints. With the Assistants API, you simply add each new message to an existing
thread
.
They showed off these Assistants API features in a demo application, and it lived up to OpenAI’s description of Assistants API as their “first versions of Agents.” The demo included reading in PDFs and interacting via voice, using Whisper and TTS in the API.
These aren’t necessarily new capabilities, in that agents are being coded already, but developers need to access various tools and APIs and code them together. What is new is that they lowered the barriers to adopting these capabilities.
OpenAI is tipping their hat that GPTs and Assistants API are just the start. Sam Altman spoke of “gradual iterative deployment,” so they are feeling their way forward on this. OpenAI is moving towards agentic end to end experience.
Practical Updates and AI/UX
Bottom-line takeaway: This was an impressive OpenAI DevDay that keeps OpenAI in the lead on AI.
They didn’t deliver a larger AI model, and yet OpenAI has delivered an impressive set of vital capability improvements, making features more accessible, making a whole lot of use cases more practical.
For example, OpenAI making a stateful API with Assistants API has lowered the bar tremendously to development of capable AI assistants and embedded AI features.
There is more to the AI ecosystem than just the foundation AI model. Specifically, many features released today improvements to the user or developer experience. UI/UX has come to AI, i.e., the need for good user experience to deliver real value in AI. Call it AI/UX.
To that end, features like Whisper V3 and TTS could be very important as a low-cost high-quality path to voice-enabling AI embedded applications.
Is OpenAI’s growing capabilities a startup-killing moving target?
OpenAI’s retrieval system might displace a lot of uses of RAG frameworks. Will this kill langchain? Unlikely that it will close off open source frameworks, but if Assistants API is capable and low-cost, it will gain mindshare and subsume features of others.
What happened to plug-ins? They seem to be subsumed by the GPTs and Assistants API. Features are continuing to evolve, as OpenAI and other figure out the right architectures to deliver AI applications.
In March, we had a Plug-in store, now they announced a GPT store. Will next year be something very different and GPTs will be obsolete? Lack of platform stability leaves developers in a pickle. It's a big set of capabilities moving so fast, one wonders what to build on. Will the interface or paradigm get quickly obsoleted?
OpenAI has come for ElevenLabs. They have cheaper TTS and it’s really good.
The bigger picture is that with every move outside the foundational AI model, OpenAI may kill many AI startups. Will the “GPT store” become the custom AI version of the iPhone app store? Time will tell.
Final Thoughts
In his closing, Sam Altman tipped his hand that there is more, much more, to come. Believe him.
“We'll be able to do more, create more, and have more. … We will have superpowers on demand. … What we launched today will look very quaint” - Sam Altman