AI Week In Review 24.03.23
Nvidia announces Blackwell, Stable Video 3D, Open Interpreter 01 Lite, Character AI adds voice, X AI opens Grok, Microsoft hires Suleyman & Inflection team, Stability CEO resigns, Voice-swap Agency.
AI Tech and Product Releases
The biggest AI product announcements this week came from Nvidia’s GTC. As we described in our article “NVidia GTC - Nvidia revvs up the AI revolution,” the centerpiece of Nvidia GTC was Blackwell, Nvidia’s next-generation GPU. The Blackwell B200 can achieve 20 petaflops on a single board, with 2.5 to 5 times speedup over Hopper H200 on training, and up to 30 times speedup on inference. Nvidia’s Blackwell-based DGX AI supercomputer achieves an Exaflop on a single rack.
They also had numerous other announcements around their NIMs and NeMo microservices for AI, Project Groot for robotics, and DRIVE Thor for automotive applications.
X.AI released Grok-1 weights and code under open source Apache 2 license. We dove into the details of Grok-1, a 312B parameter Mixture of 8 Experts LLM, in our article “Grok revealed.”
Stability AI released Stable Video 3D, a generative model based on Stable Video Diffusion. This new model advances the field of 3D technology, delivering greatly improved quality and multi-view. The model is available for download on Hugging Face and can be used commercially with a Stability AI Membership. They released a research paper here explaining the Stable Video 3D model:
“By adapting our Stable Video Diffusion image-to-video diffusion model with the addition of camera path conditioning, Stable Video 3D is able to generate multi-view videos of an object.”
Remember the hype around the Rabbit R-1, the AI device wrapping an AI agent that you can talk to and give directions? That inspired an open-source project, the Open Interpreter 01 Lite. The 01 Lite is a voice-enabled device that connects to Open Interpreter, an open-source AI that can run code and control your computer. Putting these together makes for a fantastic voice-enabled AI assistant.
You can buy an 01, or you can build one, by getting the bill-of-materials from The 01 Project. They are “building an open-source ecosystem for AI devices” and see themselves as the GNU/Linux of AI devices. The underlying concept that unifies the Rabbit, Open Interpreter and related devices is Andrei Karpathy’s concept of LLM as OS, as shown in the figure below. We also discussed this in “The Rabbit and the LAM.”
Character AI says “Give your Characters a Voice!” They recently launched a new feature that adds voices to characters with just 10 seconds of audio. “Choose from thousands of Voices or create your own.” Now available to everyone for free.
Realistics AI offers AI avatars with human realism, thanks to realistic avatar video and audio. This is in beta and available on iOS as an iPhone app for now.
Sam Altman’s recent interviews are giving us hints of possible ChatGPT-5 release date, price, and capabilities. He has suggested better reasoning capabilities, extended multi-modality, and more personalization. “People want very different things out of GPT-4; different styles, different sets of assumptions – we’ll make all that possible.”
In a Lex Fridman interview podcast this week, Altman stated: “We will release an amazing model this year. I don’t know what we’ll call it.” He also said that the leap in capabilities from GPT-4 to GPT-5 will be similar to the jump from GPT-3 to GPT-4.
“We launched images and audio, and it had a much stronger response than we expected. We’ll be able to push that much further, but maybe the most important areas of progress [in GPT-5] will be around reasoning ability.” - Sam Altman, OpenAI
Top Tools & Hacks
You can get access and use Google’s Gemini 1.5 Pro at aistudio.google.com. There’s no waitlist, it’s free, and the AI Studio interface is a good environment for trying out prompts and experimenting with Gemini models. For example, you can also create a tuned model on the Gemini 1.0 Pro. Come for the top-tier model, stay for the prompt tuning environment.
AI Research News
Our weekly AI research roundup, published on Friday, discussed in detail these AI research results:
Apple’s MM1: A paper on developing a SOTA 30B parameter multi-model LLM.
MineDreamer: Using Chain of Imagination to help AI agents achieve goals.
Common Corpus: The largest (total 500 billion words) public-domain (non-copyright) dataset for LLM training.
VLOGGER: Turning Audio into speaking avatars that includes body movement.
Model Merging with Evolutionary Algorithms.
MindEye2: Faster AI mind-reading.
TacticAI: AI football (soccer) coach.
Related to the MindEye2 results was this recent news: Neuralink revealed first human-trial patient, a 29-year-old quadriplegic who says brain chip is 'not perfect' but has changed his life. This amazing AI technology is already changing lives.
AI Business and Policy
In big hiring news, Microsoft announced the hiring of DeepMind co-founder (and Inflection AI CEO) Mustafa Suleyman. He will lead a new division called 'Microsoft AI' — “focused on advancing Copilot and our other consumer AI products and research.” Co-founder Karen Simonyan and several others from Inflection will be joining him.
One take on this is Microsoft has swallowed Inflection in an aqui-hire spree. It puts Inflection AI’s future into question, and cements Microsoft’s AI leadership by augmenting their in-house AI capabilities. However Microsoft is not abandoning OpenAI at all.
Our AI innovation continues to build on our most strategic and important partnership with OpenAI. We will continue to build AI infrastructure inclusive of custom systems and silicon work in support of OpenAI’s foundation model roadmap, and also innovate and build products on top of their foundation models.
Stability AI founder and CEO Emad Mostaque resigned, saying “it is now time to ensure AI remains open and decentralised.” The CEO’s exit follows on heels of departures of high-profile Stability AI executives and researchers. The Stability AI board appointed COO Shan Shan Wong and CTO Christian Laforte as the interim co-CEOs of Stability AI.
Reports that Apple is in talks with Google to use Gemini on their iPhone suggest Apple is going big in AI in their next iOS update. Apple may use Gemini or OpenAI to incorporate LLM capabilities into Siri and Apple Held Talks With China’s Baidu Over AI for Its Devices to deal with China market:
Bloomberg's Mark Gurman says that one of the specific features Apple is developing is an improved interaction between Siri and the Messages app, which would let Siri auto-complete sentences more effectively and answer complex questions.
Voice-Swap Wants to Be an ‘Agency’ for Artists’ AI Voices. They are hoping to monetize the use of artist’s voices by others in the era of easy voice cloning:
Voice-Swap sees the voice as the “new real estate of IP,” as Pelczynski puts it — just another form of ownership that can allow a participating artist to make passive income.
A16Z recently published a list of the 100 top Generative AI apps, breaking it down to the top 50 Gen AI web apps and top 50 Gen AI mobile apps. For the top 50 Gen AI web apps, “over 40 percent of the companies on the list are new, compared to our September 2023 report.” They find that “new AI-native companies are emerging every month, spurring a dynamic, competitive market.”
AI Opinions and Articles
Ed Zittron asks: “Have We Reached Peak AI?” He berates (correctly) “OpenAI's flowery, hollow playbook” of hyping their technology without actually revealing anything useful about them. A recent example was the lack of candor (or knowledge) from OpenAI’s CTO on their own Sora AI model’s training data.
But the vague, cryptic way Sam Altman and OpenAI discuss their AI products and roadmaps adds to hype and mystique while being shallow and less than informative:
Sam Altman is repeatedly given the ability to wax lyrical about the futuristic capabilities of artificial intelligence in a way that lets him paint a picture of a technology he is not actually building. …
Every time Sam Altman speaks he almost immediately veers into the world of fan fiction, talking about both the general things that "AI" could do and non-specifically where ChatGPT might or might not fit into that without ever describing a real-world use case.
This pattern of OpenAI managing to hype their mystique while being so closed about what they are really doing is becoming tire-some.
For example, I reported above on Sam Altman’s interviews to decode OpenAI’s release roadmap. Decoding the tea leaves: They will release a big AI model update in latter part of this year. Great, but we don’t know exactly when, what the release contains, or even if it will be GPT-5. Altman leaves us guessing while assuring it will be great.
A truly open OpenAI would simply announce their roadmap of upcoming releases and prove their worth by delivering on it. Pushback on the hype is appropriate.
A Look Back …
Microsoft’s hiring of Mustafa Suleyman now puts one DeepMind co-founder at the helm of Microsoft’s AI organization, while another DeepMind co-founder, Demis Hassabis, heads up Google’s AI efforts.
The AI pioneer DeepMind was founded by Demis Hassabis in London in 2010 with cofounders Shane Legg and Mustafa Suleyman. Although DeepMind sold no products in its early years and was focused on game-playing AI, Google saw the value of DeepMind and acquired them in 2014. Since then, their research has become pivotal to advancing AI: AlphaGo, AlphaFold, and many other AI breakthroughs.
Suleyman stayed at Google until 2022, then left to form the startup Inflection AI, which developed the Pi AI model. Now he leads Microsoft AI efforts. The impact of DeepMind on AI is already significant, and it will become even more so as these DeepMind co-founders lead AI efforts at our biggest tech companies.