AI Week In Review 24.03.16
Elon to open Grok, Cohere Command-R, Covariant's RFM-1, Figure 1 demo, Cognition Lab's Devin AI agent, Midjourney Character Refs, Leonardo AI Collections, Pika Audio, Quiet-STAR, StyleGaussian.
TL;DR - Another busy week. Many incremental updates in image, video and audio (Midjourney, Pika Labs, Leonardo AI), plus progress on AI Agents (Cognition Labs’ Devin AI Software Engineer) and robotics (Covariant’s RFM-1, Figure 1 demo, Hydrol AI’s elder care doll). In research: Cadeceus extends Mamba for DNA modeling; Quiet-STAR improves LLM reasoning; Styles in Gaussian Splatting.
AI Tech and Product Releases
Elon Musk announced that xAI will open-source Grok this week. Major win for open-source, and another twist in his escalating feud with Sam Altman/OpenAI and the race to dominate the AI space. Brian Roemmele offered on X to share a how-to on using it:
The day Grok is released in open source I will have a detailed how-to step-by-step easy instructions on how download it and use it as personal AI platform.
Cohere releases Command-R as an AI model for RAG (retrieval-augmented generation) applications in the enterprise. “Command-R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise.” It boasts 128k context, multi-lingual capabilities, great skill in RAG pipelines (see chart below), and claims “lower pricing.” It’s not a fully open-source AI model, but weights are on HuggingFace.
Midjourney launched a feature to generate consistent characters across output images using a character image referenced in a URL. Midjourney’s new feature is called character reference. This long-awaited feature for consistent character generation is useful for storyboards, character illustrations, AI influencers, and could be very useful when enabled for video as well.
Pika Labs introduced a new ‘Sound Effects’ tool, where “you can now generate sound with your videos on Pika.” The feature now enables users to pair their AI video generations with audio from prompts or automatically based on the video content.
Leonardo AI introduced a new feature called Collections, which helps users organize the images generated by AI. It’s “the simplest way to organize your generated content on Leonardo.Ai.”
Cognition Labs announces Devin, “the world’s first fully autonomous AI software engineer.” It’s not released, but they shared impressive demos of AI agent Devin completing software development tasks, surpassing prior AI agent capabilities. We covered Devin in depth in “AI Agents Come Alive.”
Robotics startup Covariant just unveiled RFM-1, an AI platform that brings ChatGPT-like language reasoning to physical robots. The platform allows robots to learn new skills, adapt to unexpected situations, and interact with humans more naturally.
Robot maker Figure partnered with OpenAI to put GPT-4 inside of its humanoid robot Figure 1, and showed off its new capabilities in a demo video. The Figure 1 robot was able to see, speak, hear and answer questions, while performing simple tasks like moving dishes onto a dishrack. More on this in “AI Agents Come Alive.”
South Korean startup Hydrol AI launched a $1,800 AI-powered companion doll. The doll is aimed at alleviating loneliness among the country's rapidly aging population. Endearing elder-care or creepy? You decide.
Apple's Siri is set for a major upgrade that could make it a challenger to ChatGPT in the AI assistant race. Co-creator of Siri, Dag Kittlaus, recently said the assistant will “accelerate and become a real force in the AI arena.”
Amazon is incrementally infusing AI into their E-commerce platform: Amazon brings more AI tools to sellers, including AI that will build a product page with information from another site.
Top Tools & Hacks
For Adobe Firefly users: “You can be in a gallery on Adobe Firefly community page too! When you generate your image share "Submit to Firefly Community". You'll be asked to enter your name for crediting you. Then millions of people will be able to use your prompt and settings as an inspiration!”
Here’s a way to use Microsoft Copilot Pro for free.
AI Research News
Yair Schiff presents Caduceus, work he did with other researchers on DNA language modeling. Caduceus is a bi-directional DNA language model built on Mamba that handles long range modeling and respects bi-directional symmetry of DNA. They achieve this with an extension of Mamba architecture called MambaDNA:
BiMamba is a parameter/memory-efficient, bi-directional version of Mamba. BiMamba is implemented by running a Mamba module on both a sequence and its reverse, with in and out projection weights tied. We also introduce MambaDNA, a module that extends Mamba / BiMamba to support reverse complement equivariance.
Caduceus is SoTA on several genomics-related benchmarks, including identifying causal SNPs for gene expression.
The paper “Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking” is a promising new approach to improve LLM reasoning by training it to be more reflective during inference. When we write or talk, we sometimes pause to think, then choose our words. How to get LLMs to do the same?
We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions.
This is achieved by training the LLM using a process called token-wise parallel sampling. They find that they were able to improve the accuracy of LLMs on a variety of benchmarks, including improving zero-shot GSM8K by 5.9%:
“Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions.”
While Quiet Star has a substantial overhead, it’s a promising approach to get LLMs to do more “System 2” type thinking. Quiet Star is not the same as other approaches such as chain of thought reasoning, in fact it can be complementary and thus combined.
From researchers in Munich, Germany comes “Gaussian Splatting in Style,” which extends neural style transfer to 3D. The idea is to merge a style image with Gaussian splatting 3D representations and thereby render stylized 3D Gaussians and stylized novel views from it.
In a similar vein, the paper “StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting” presents StyleGaussian, which does 3D style transfer of any image to a 3D scene with real-time rendering and multi-view consistency:
Utilizing a three-step process—embedding, transfer, and decoding—it revolutionizes the technique with an efficient feature rendering strategy and a K-nearest-neighbor-based 3D CNN decoder. These advancements significantly cut down memory usage and ensure strict multi-view consistency.
What good are 3D stylized Gaussians? They could be used for world-building in video production, ‘skins’ in video games, or CAD for product design. Code for StyleGaussian is available on Github.
AI Business and Policy
Will AI Help or Hurt Workers? At SXSW, The Answer Depends on Who You Ask. SXSW made headlines in resisting AI optimism when “a reel of speakers touting the merits of AI was loudly booed by audiences.” There’s a vibe disconnect between SWSX speakers like Peter Deng, head of ChatGPT at OpenAI, and audiences made up of creative artists and techies.
It could be that Tech layoffs are on the rise, so some fear AI replacement. However, I think it is more about the loss of control. Some are concerned about who is in control of AI, rather than AI itself.
When they said “AI will take their jobs” did they include working at AI startups as well? AI fraud detection software maker Inscribe.ai lays off 40% of staff. Not actually due to AI, though; the startup has been missing revenue targets.
India drops plan to require approval for AI model launches. “India is walking back on a recent AI advisory after receiving criticism from many local and global entrepreneurs and investors.”
AI Opinions and Articles
It’s AI Doomer season again. U.S. Must Move ‘Decisively’ to Avert ‘Extinction-Level’ Threat From AI, Government-Commissioned Report Says. This report from Gladstone AI says:
The U.S. government must move “quickly and decisively” to avert substantial national security risks stemming from artificial intelligence (AI) which could, in the worst case, cause an “extinction-level threat to the human species,” says a report commissioned by the U.S. government published on Monday.
Diving in, there is less here than meets the eye. The risks cited are based on risks expressed by people these report writers interviewed. In other words, opinion polling. Except they spoke to a dozen people, not hundreds.
The Harrises recognize in conversation that their recommendations will strike many in the AI industry as overly zealous. The recommendation to outlaw the open-sourcing of advanced AI model weights, they expect, will not be popular.
Unpopular? No, “dangerously misguided” would be more accurate.
Yann LeCun on a recent Lex Fridmen podcast argues against the AI doomer narrative: Real dangers of AI are not from ‘extinction-level’ risks; nuclear weapons is a bad analogy. AI risks come from abuse, mis-information, and lack of safety controls.
To Yann LeCun, free speech and diversity is one solution to the problem of misinformation. Open AI models reduce potential risks of AI being under control of just a few people, yielding diversity instead of limits on information where people can abuse their monopoly for misinformation. (Consider how the Chinese Government censors AI models.)
Oddly, the Gladstone people agree with the value of open source while also proposing limiting it:
“Open source is generally a wonderful phenomenon and overall massively positive for the world,” says Edouard, the chief technology officer of Gladstone.
Open AI models have been a source for good. We should support open AI models until they provably let us down.
A Look Back …
I mentioned in my previous article that it was the one-year anniversary of GPT-4’s release. I thought I’d reshare - especially for newer subscribers - a look back at some of my earliest articles from one year ago this month, when the AI revolution was just getting started:
Boom! GPT-4 has arrived - LLMs go to college and pass the LSAT!
Fundamental Thoughts on the State of AI - Scale begat Foundational Models begat today's AI that's changing the world …