AI Week In Review 24.05.04
gpt2-chatbot, Starcoder2 Instruct, GitHub Copilot Workspace, Amazon Q developer Claude 3 iOS app, StoryDiffusion, Yelp AI assistant, newspapers sue OpenAI, DARPA's AI tank.
AI Tech and Product Releases
The biggest release news was a non-release. A “mysterious new AI model called “gpt2-chatbot” showed up on Lmsys Arena, impressed many with its GPT-4-level and beyond capabilities, and then was taken down after two days. Huh. As shared in our “Getting to System 2: LLM Reasoning” article earlier this week, it’s likely an OpenAI new model test run, but nobody is claiming it and we don’t really know.
BigCodeProject launched StarCoder2 Instruct 15B, a self-aligned code LLM built from Star Coder2; it scores 72.6 on HumanEval. StarCoder2 Instruct was build in a self-instructive process that generated thousands of instruction-response pairs autonomously, using a permissive and transparent (open source and public) pipeline and no human annotations. This outperformed previous versions of StarCoder2 trained using GPT-4 distilled data. Code and data is on Github and the model is on HuggingFace.
Github announced a major upgrade to Copilot, with Github Copilot Workspace. Copilot Workspace is “redefining the developer environment” by building assisting a developer through the specification, design, implementation and test of software changes.
It’s an agentic AI workflow assistant but with full interaction with users as a co-pilot, as opposed to attempting full automation. It also works on mobile devices.
“By quantifiably reducing boilerplate work, we will empower professional developers to increasingly operate as systems thinkers. We believe the step change in productivity gains that professional developers will experience by virtue of Copilot and now Copilot Workspace will only continue to increase labor demand.” - Github Copilot Workspace announcement post
Amazon announced Q Developer is generally available. This renamed upgrade to Amazon’s Code Whisperer can help with AWS deployments as well as many code development functions. It can be integrated with software development IDEs such as VS code, JetBrains as an extension.
Anthropic launches new iPhone app and premium plan for businesses. You can now get Anthropic’s Claude 3 as an iOS app, and it includes vision capabilities on the phone. Anthropic’s Claude Team plan, aimed at enterprise users, is $30 per user per month. TechCrunch notes why many AI player are pivoting to enterprise - it’s where the money is:
Yet corporate spending on generative AI is forecasted to be enormous. IDC expects that it’ll reach $15.1 billion in 2027, growing nearly eightfold from its total in 2023.
Yelp is launching a new AI assistant to help you connect with businesses. This assistant is a feature under Projects within the Yelp App, and the pitch is: “Rather than using a traditional search box to look for different kinds of professionals for the problem, you can describe the issue directly into the chat interface. The bot asks follow-up questions to gather more information along with your zip code to create a project.”
Gemini is now a shortcut on Chrome, via the “@” key; type “@gemini” to access it.
Coming soon: Elon Musk shared his plans for AI-based news on X, using the Grok AI model, with Alex Kantrowitz (who writes and podcasts about Tech on Big Technology). Musk says Grok will not look directly at article text, and will instead rely solely on social posts. “It’s summarizing what people say on X.”
Rumor: OpenAI’s ChatGPT-Powered Search Engine May Have A 9 May Live Date. Subscribe to AI Changes Everything and find out in our next Weekly AI News if the rumor proves true.
Top Tools & Hacks
First official commissioned music video made with OpenAI Sora. In the fine print: This was “55 clips cut together in premiere.” So a lot of clips for a short video.
On the topic of video generation, Story Diffusion is an open-source image and video generation AI model that was released this past week. It can generate highly consistent outputs, either in storyboard, comic-book (see Figure 1 with “Lecun goes to the moon”), or video.
StoryDiffusion is open source with code available on Github and can be accessed on HuggingFace . It has a companion paper, “StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation.”
AI Research News
Our AI Research Roundup for this week:
LLM Performance on Grade School Arithmetic
Prometheus 2: A Language Model For Evaluating Language Models
PoLL: Evaluating LLM Generations with a Panel of Diverse Models
Iterative Reasoning Preference Optimization - fine-tuning LLMs for reasoning.
Med-Gemini: Capabilities of Gemini Models in Medicine - Med-Gemini shows we are getting to human-level performance on medical tasks.
Better & Faster Large Language Models via Multi-token Prediction - multi-token prediction can be used to achieve faster and more capable LLMs.
AI Business and Policy
Eight prominent U.S. newspapers owned by investment giant Alden Global Capital are suing OpenAI and Microsoft for copyright infringement. Like prior lawsuits from The New York Times and others, they claim the companies unlawfully trained AI models on their articles.
“We’ve spent billions of dollars gathering information and reporting news at our publications, and we can’t allow OpenAI and Microsoft to expand the big tech playbook of stealing our work to build their own businesses at our expense,” - Frank Pine, executive editor Alden
Microsoft invests in growth in Asia: Microsoft announced a $2.2 billion investment to accelerate Malaysia's digital growth, for new cloud and AI infrastructure, and providing AI training for 200,000 Malaysians. They also announce a similar $1.7 billion investment in Indonesia as well as a $2.9 billion investment in Japan.
There is a new Kaggle challenge to predict LMSYS human preferences with a total prize of $100,000. The goal of the competition is to improve chatbot performance and alignment with user expectations.
Ukraine unveils AI-generated Foreign Ministry Spokeswoman. Fear not, their statements are not AI-generated … yet:
The foreign ministry's press service told AFP that the statements given by Shi would not be generated by AI but "written and verified by real people."
AI Startup funding news:
Motional, a company developing autonomous driving technology for robotaxis, raised $475M from Hyundai Motor Company. Hyundai will spend another $448 million to buy out Aptiv’s stake, giving Hyundai a majority stake in the company.
Custom AI model provider Lamini Raises $25M For Enterprises To Develop Top LLMs In-House.
US government shows off massive AI-powered robot tank. This impressive autonomous vehicle weighs 12 tons and goes 25 mph, and is in action on this video.
However, going from these DARPA demos to deployment is a long road, as AI hits trust hurdles with U.S. military. Axios in this article cites cases where LLMs suggested escalation and conflict, and cannot explain how they arrived at conclusions nor handle unexpected events. The DoD will take a cautious approach to scaling and deploying AI:
In June, the Navy's chief information officer, Jane Overslaugh Rathbun, concluded that while "generative AI can be a force multiplier," commercial models have "inherent security vulnerabilities" and are "not recommended for operational use cases."
AI has narrated over 40,000 audiobooks on Audible, according to Bloomberg, thanks to a free AI tool from Amazon to help authors release audiobooks.
AI Opinions and Articles
The Verge has a story on “The teens making friends with AI chatbots.” You can read this as a heart-warming story of AI rescuing troubled young people, or as a disturbing trend where human relationships get replaced by virtual AI ones.