AI Week In Review 24.06.22
Claude 3.5 Sonnet is the new King of AI, RunwayML Gen 3, Meta AI's Chameleon, JASCO, & AudioSeal, BigCodeBench, Deepseek Coder V2, Gemini context caching goes live, Ilya Sutsveker's SSI goes for ASI.
Top Tools - Claude 3.5 Sonnet
This week’s release of Claude 3.5 Sonnet by Anthropic is the top news and the latest top AI model. It surpasses both Claude 3 Opus and GPT-4o in benchmarks and in user experience, with X comments like “It hugely surpasses the capabilities of GPT-4o, Gemini Pro, Llama and other LLMs.”
Best yet, it’s available right now for free on claude.ai.
Artifacts is an important new feature of Claude 3 Sonnet. Artifacts is an additional panel in the chat interface that displays coding, data and visual outputs. You can get Claude 3.5 Sonnet to write code, then run it to create results on the fly; this is like Code Interpreter and some AI agents, but directly on the foundation AI model.
Artifacts is a great feature. (Note: Artifacts in Claude 3.5 Sonnet needs to be turned on via clicking on your profile pic.) Min Choi got Claude 3.5 Sonnet to create a solar system simulation.
You can also use Artifacts to create simple games and run them; or generate simple graphics (using SVG), or create prototype websites or user interfaces. I got it to create a graphic picture with a Haiku. Simple, yes, but the bones of a full multi-media creation engine.
One of the things that makes Artifacts work so well is that the coding ability in Claude 3 Sonnet are state-of-the-art. Claude 3 Sonnet benchmarks are beating GPT-4o almost across the board: 88.3% on MMLU, 59% on GPQA, and 92% on HumanEval coding.
Claude 3.5 Sonnet is also strong at vision model yet, surpassing Claude 3 Opus on standard vision benchmarks, such as 68% on MMMU and 95% on document visual Q&A.
Anthropic shared a model card and technical report on Claude 3.5 Sonnet. They shared few architecture details, but shared many benchmarks, including human evaluations where it outpaces Claude 3 Opus, their prior best model. Claude 3.5 Sonnet has near-perfect recall over its 200k token context window. Claude 3.5 Sonnet is 2x the speed of Opus, 1/5th the cost of Opus at just $3 per million input tokens and $15 per million output tokens.
This all puts the challenge on OpenAI. GPT-4o is no longer the best AI model out there. Moreover, Anthropic promised more AI model soon, saying “we’ll be releasing Claude 3.5 Haiku and Claude 3.5 Opus later this year.”
AI Tech and Product Releases
Aside from Claude 3.5 Sonnet, another big release hit this week: RunwayML’s Gen 3 alpha. Runway themselves say Gen 3 is a new frontier for high-fidelity, controllable video generation. The hyper-realistic Gen-3 Alpha is capable of 10-second-long clips and would be available in “days” to paying Runway subscribers first, with a free tier on deck in the future.
Some of its best-in-class features include fine-grained temporal control; you can control the camera and pacing. It also can render very photorealistic humans. Runway says:
“Gen-3 Alpha excels at generating expressive human characters with a wide range of actions, gestures, and emotions, unlocking new storytelling opportunities.”
They show impressive examples that bear that out.
On X there are many examples of demo videos, as well as comparisons of Runway Gen 3 Alpha vs Kling. We now have six main contenders in AI video generation: OpenAI Sora, Google Veo, Luma Labs AI Dream Machine, Kling, and RunwayML Gen 3. Comparisons are inevitable.
One advantage RunwayML has is its 6-year history building products for movie makers; Gen 3 is its third iteration at AI video generation. RunwayML can bring useful features like motion-brush for editing control to make this more than just generation but a tool for the whole video creation process.
Google Gemini context caching is now live. Developers can save money on repetitive uses of the same datasets or knowledge bases, or for storing long system prompts. (H/T Sam Witteveen.)
Meta AI has announced release of four new open AI models to accelerate future AI research. These models include:
Chameleon: 7B & 34B mixed-modal language models that can understand both images and text.
Multi-Token Prediction: Pretrained Language Models for code completion using multi-Token prediction, that helps AI models predict words faster.
JASCO: An audio generation model with improved controllability, that differs from other text-to-music approaches by using conditioning inputs like chords or beat.
AudioSeal to detect AI-Generated Speech: The first audio watermarking technique designed specifically for the localized detection of AI-generated speech.
Google shared an update on Generating Audio for Video. Video-to-audio (V2A) technology, which makes synchronized audiovisual generation possible, combines video pixels with natural language text prompts to generate rich soundscapes for the on-screen action:
Our V2A technology is pairable with video generation models like Veo to create shots with a dramatic score, realistic sound effects or dialogue that matches the characters and tone of a video.
BigCode project announced BigCodeBench, a benchmark for evaluating LLMs on practical, realistic, and challenging programming tasks. It contains 1,140 function-level tasks to challenge LLMs to compose multiple function calls as tools from 139 Python libraries.
The BigCodeBench leaderboard is on HuggingFace. Top 3 coding models are GPT-4o (61%), Deepseek Coder V2 (59%), Claude 3.5 Sonnet (58%). A paper, dataset, code and more were released and available via project page.
AI Research News
Mentioned in our AI Research Roundup this week:
Deepseek Coder V2, an open-source AI coding model released this week .
DataComp-LM: Next generation of training sets for language models.
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Self-MoE: Towards Compositional LLMs with Self-Specialized Experts.
Microsoft has released Florence-2, a light-weight open-source Vision Foundation Model, used for tasks such as captioning, object detection, grounding, and segmentation. The paper “Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks” details their work.
AI Business and Policy
We now know what new thing Ilya Sutskever is up to. He announced his new company Safe Superintelligence, co-founded with investor Daniel Gross and Daniel Levy (ex-OpenAI), in a blog post and on X, declaring:
Building safe superintelligence (SSI) is the most important technical problem of our time. We've started the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligence. … We are assembling a lean, cracked team of the world’s best engineers and researchers dedicated to focusing on SSI and nothing else.
There are skeptics, believing that “AI Safety is a mirage” for Superintelligence.
Michael Dell on X confirms Musk’s massive AI factory plans: “We’re building a Dell AI factory with Nvidia to power Grok for X AI.”
Apple Won’t Roll Out Apple Intelligence and other AI Tech In EU Market Over Regulatory Concerns. Apple is holding back because of Digital Markets Act interoperability mandates.
OpenAI bought Rockset to bolster its enterprise AI. Rockset builds tools to drive real-time search and data analytics.
AI startup takes on CRM (customer relationship management.) Ex-HubSpot exec builds an AI-powered CRM that learns for you, with $4M seed led by Sequoia.
AI Opinions and Articles
‘What’s in it for us?’ journalists ask as publications sign content deals with AI firms. A pink slip, maybe? The article mentions writers putting AI protections in contracts, as well as that other media firms, including the Atlantic and Vox Media, signing content licensing deals with OpenAI.
Some deals give the media credit and possible monetization:
“The Atlantic’s articles will be discoverable within OpenAI’s products, including ChatGPT, and as a partner, The Atlantic will help to shape how news is surfaced and presented in future real-time discovery products,” Bross told TechCrunch. “The deal ensures guardrails and protections around how our content does appear within OpenAI’s products. … If an Atlantic article is surfaced in response to a query, there will be Atlantic branding and a link back to the article on our site.”
These deals are only giving media a fraction of what could earn in the past. I don’t have a good answer to journalists concerned with losing jobs or leverage over their work output. Our intellectual output will soon be dwarfed by what AI can do.