AI Week In Review 24.05.11
Udio Audio Inpainting, OpenDevin, KreaAI video gen, Gradient's 4M context Llama 3 8B, Apple's M4, MAI1, OpenUI, AlphaFold 3, DeepSeek V2, OpenAI StackOverflow deal, OpenAI's Model Spec, UK's Inspect
Figure 1 is from Michael Laskey of Sheep Robotics: “Really cool task our AI agent is learning. Targeted weed spraying is a great application for robots. We can drastically reduce the amount of herbicide used in a parks and community spaces!”
AI Tech and Product Releases
Udio has launched Audio Inpainting, an innovative feature to seamlessly edit and refine audio tracks. Udio users can use Audio Inpainting to re-generate a section of an audio track based on the surrounding context, which can be used to edit single vocal lines, correct errors, or smooth over transitions. They also are introducing multiple subscription tiers.
OpenDevin CodeAct 1.0 has been released. It’s a new open coding agent that achieves a new SOTA score of 21% unassisted resolve rate on SWE-Bench Lite, improving the previous SOTA by SWE-Agent. Their blog post has more details, and all code to run the AI Agent is available on Github.
Gradient AI announced that they achieved a 4 million token context length on a Llama 3 8B, the largest context window of any open LLM. Gradient has also shared this article “Evaluating Models Beyond Vanilla Needle-in-a-Haystack” to explain how to evaluate long context windows on more complex tasks than retrieval of keywords.
Krea AI launched a video creation tool that allows users to blend images together and generate videos on their platform. Krea AI shared examples of how it works on X. Give it images and it will in video form interpolate between them, making interesting transforming videos.
ElevenLabs is previewing a new model that converts prompts into song lyrics. It’s getting some very positive reviews.
Apple announced their next-generation chip, the M4, with 28 billion transistors, 10-core CPU, 10-core GPU, and the “most powerful Neural Engine ever” achieving an astounding 38 trillion operations per second. Apple announced M4 is coming to the new iPad Pro, greatly improving iPad performance for AI-related tasks.
Rumor: Microsoft AI is building a new 500B LLM called MAI1; the project is headed by Mustafa Suleyman, formerly the CEO of Inflection AI who now heads Microsoft AI. This is the first time Microsoft has worked on a large language model of this scale.
After weeks of rumors and speculation on OpenAI’s next move, OpenAI has pre-announced they will present new ChatGPT and GPT-4 updates on Monday, May 13. What’s in it? What we know so far. Altman says, “not gpt-5, not a search engine…”
The mysterious “im-a-good-gpt2-chatbot” came back, after being taken off the platform, and was joined by a colleague “ im-also-a-good-gpt2-chatbot” on Lmsys Arena. These are believed beta AI models from OpenAI. One guess from Siqi Chen on how they tie in with OpenAI’s upcoming announcement:
gpt4-auto is im-a-good-gpt2-chatbot: new model endpoint that automatically retrieves data from web, internal data integrations to augment responses.
gpt4-lite-auto is im-also-a-good-gpt2-chatbot: same features but with the lite model (drop in gpt-3.5 replacement) instead.
Last week, the rumor was OpenAI would announce May 9th (which didn’t happen) so take any speculation with a grain of salt. We’ve also seen claims OpenAI would launch a search engine, which has been denied and is likely mis-interpreting OpenAI’s actual development.
Top Tools & Hacks
OpenUI is a free, open source browser-based application that helps you generate user interfaces, landing pages, web sites, etc. with the help of AI. Developed by Weights and Biases co-founder Chris Van Pelt, it’s available via Github, and is easy to download, setup and use.
AI Research News
Google DeepMind’s AlphaFold 3 release is the most significant AI research breakthrough this, providing AI modelling of biological molecule structure and interactions. We covered this and other new AI research results in our AI Research Roundup for this week:
Dr Eureka: Language Model Guided Sim-To-Real Transfer Learning
AlphaFold 3: Predicting structure and interactions of life’s molecules
xLSTM: Extended Long Short-Term Memory
KAN: Kolmogorov–Arnold Networks
VoT: Visualization of Thought
Agent Hospital: Simulating a Hospital with Medical Agents
DeepSeek V2 was released last week; it’s a 236B parameter MoE (mixture-of-experts) with a 128k context window and 21B active parameters. Trained on 8 trillion tokens, it performs well, with 78.5 on MMLU. It’s “a novel sparse architecture that enables training strong models at an economical cost through sparse computation.”
DeepSeek V2 is an open AI model with technical details shared on Github and in their paper “DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model.” The Deepseek V2 paper “holds some gems”, showing how they train this high-quality model very efficiently.
AI Business and Policy
OpenAI and Stack Overflow are partnering to improve the coding ability of tools like GPT-4. Stack Overflow is sharing their API service, dubbed OverflowAPI, that gives access to their site content.
However, StackOverflow’s giveaway of user-generated content caused a backlash among developers, with some contributors to Stack Overflow protesting by deleting answers. Stack Overflow now owns all the answers posted there and users are not allowed to remove them, and is removing contributors who delete their answers.
SoundHound AI and Perplexity announced a partnership to integrate LLMs and voice assistants across devices, allowing SoundHound users access to Perplexity AI assistants through car and edge devices.
CoreWeave opens European HQ in London with plans for 2 UK data centers. AI cloud-computing startup CoreWeave recently raised over $1 billion, valuing CoreWeave at a reported $19 billion. It is using part of these funds to expand outside the US.
Another endorsement of London as an AI hub, Scale AI announces plans for new European headquarters in London. Scale AI also reportedly closed its Austin office.
AI Startup funding announcements:
South Korean AI chip firm DEEPX raises a $80M Series for on-device AI chips.
Lore Health launched raised a $80M Series A and launched an AI-powered social network for support and encouragement, called LoreBot.
XTEND Robotics, a firm focused on using AI to control drones and robots, announced they raised a $40M Series B round.
Triomics raises $15M to use LLMs to improve cancer treatment. Triomics have developed customized LLMs called OncoLLM to streamline complex oncology-related workflows.
U.K. agency releases tools to test AI model safety. The UK’s AI Safety Institute has released their Inspect testing platform as an open-source AI evaluation tool “to assess specific capabilities of individual models … including their core knowledge, ability to reason, and autonomous capabilities.” They are open sourcing this as part of a goal to advance AI Safety through rigorous evaluations of AI capabilities.
“We hope to see the global AI community using Inspect to not only carry out their own model safety tests, but to help adapt and build upon the open source platform so we can produce high-quality evaluations across the board.” - AI Safety Institute Chair Ian Hogarth
AI Opinions and Articles
OpenAI’s Model Spec is an effort by OpenAI to define and specify desired behavior for their AI models. The Model Spec outlines objectives, rules, and default behaviors. Example default behaviors include:
Assume best intentions from the user or developer
Ask clarifying questions when necessary
Be as helpful as possible without overstepping
Support the different needs of interactive chat and programmatic use
Assume an objective point of view
They hope this will help them create helpful and beneficial models, saying “we intend to use the Model Spec as guidelines for researchers and AI trainers who work on reinforcement learning from human feedback.” OpenAI will use feedback from experts and the public to improve the Model Spec.
To deepen the public conversation about how AI models should behave, we’re sharing the Model Spec, our approach to shaping desired model behavior. - OpenAI
A Look Back …
RunwayML held a film festival this week and shared the winning video entries of their festival. The entries were all pre-Sora, and so there were limitations in the AI-generated images and video, but these are still interesting and creative productions.
Kyle Wiggers, reviewing the RunwayML Film Festival for TechCrunch, summed it up as “humanity over tech” saying:
Indeed, it tended to be obvious — sometimes painfully so — which parts of films were the product of an AI model, not an actor, cameraman or animator. Even otherwise strong scripts were sometimes let down by underwhelming generative AI effects.
For now at least, “The human — not AI — contributions often make all the difference.”