AI Week In Review 24.02.24
"Adjust your Timelines" - Google's Gemma, Chrome "help me write", Stable Diffusion 3, DoRA improves on LoRA, Nvidia zooms on Earnings, robotics startup FigureAI gets funding.
AI Tech and Product Releases
Google released their smallest Gemini models as “Gemma,” 7B and 2B open AI models. It’s impressive that Google released an open AI model at all, given their prior Bard and Gemini AI models were proprietary. Google released model weights to enable local LLM runs, released toolchains to support fine-tuning and experimentation, and shared support for running it in Google Colab notebooks.
Gemma has a really large vocabulary that leads to massive embedding layers, making Gemma 7B actually over 8B parameters (Another marketing overreach by Google.) It was trained on 6 trillion tokens, and the 7B model used multi-head attention.
How does Gemma 7B stack up? Google touts benchmarks showing Gemma 7B besting Llama 2 7B across the board, but early user feedback has been underwhelming. Matt Berman’s video review found Gemma ‘shockingly bad’. Others observed Instruction-tuned Gemma model is ‘too safe’ and most users found that leading AI model Mistral 7B is still better than Gemma 7B:
“After trying Gemma for a few hours I can say it won’t replace my mistral 7B models. It’s better than llama 2 but surprisingly not better than mistral.”
In other Google release announcements:
Google announced Expanded access and improvements to generative AI in Performance Max, their ad development platform. This includes and Canva integrations and other support for AI-enabled asset generation.
Google added an AI-enabled “Help me write” feature in Chrome that can help users write content online.
Stability AI announced Stable Diffusion 3 in early preview, calling it “our most capable text-to-image model with greatly improved performance in multi-subject prompts, image quality, and spelling abilities.” Partcularly impressive is its fidelity in generating text in images
Stable diffusion 3 uses a new architecture, diffusion transformer, similar to Sora architecture. The results are similarly a big step up. “Early preview” so we’ll have to wait to get our hands on SD3.
Because it’s underlying architecture is shared with Sora, some are asking if StabilityAI is working on a Sora-like next-gen Stable Video. StabilityAI Emad Mostaque responded in a tweet:
We want to make a large open Stable Video 2 similar to Sora but need more data & compute (hmu for collabs)
To that end happy to open up http://stablevideo.com to all
Sign up, make & rate videos to help us get the data we need
Away we go… will be a Sora-like.
PredictBase introduced LoRA Land: Fine-Tuned Open-Source LLMs that Outperform GPT-4. LoraLand consists of 25 fine-tuned Mistral-7B models, to serve specific use cases from sentiment detection to question answering. Predibase claims that all 25 fine-tuned models surpass GPT-4, GPT-3.5 Turbo, and Mistral-7B-Instruct in specific tasks. As shared by Itamar Golan on X, there is online access to this toolset at the predibase.com/lora-land website.
The Brave browser announced its Leo AI Assistant has been updated to interact with PDFs and Google Drive documents:
Brave’s built-in AI assistant Leo has recently gained new capabilities for understanding how to help users while using web-based tools and workflows.
They are leveraging PDF metadata and OCR to understand semantic structure and text of what is visible in-browser, in order to provide chat-based assistance on the sidebar.
OpenAI's latest update allows users to rate GPTs, give builder feedback, & discover new AI experiences with categories. Open AI is serious about building the new app store, says BensBites.
AI Research News
AI researchers introduce DoRA: Weight-Decomposed Low-Rank Adaptation, an improvement over LoRA for parameter-efficient fine-tuning that “consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks.”
Weight-Decomposed LowRank Adaptation (DoRA) decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead.
The paper “Towards Efficient and Exact Optimization of Language Model Alignment” introduces EXO, replacing RL in the fine-tuning alignment of AI models. They show that EXO optimizes equivalently to RL, but avoids complexity of RL by preforming probability matching instead, yielding a more efficient optimization algorithm. Results show it is superior to prior RL-based methods, including DPO and PPO.
Today’s AI research can become the released product that wows the world 8 months from now, yet not all research leads to success. How to figure out what’s important? The Latent Space duo, Swxy and Allesio, in their Jan 2024 recap, Worthwhile Research for building SOTA LLMs) pose the ‘test of time’ question to assess what AI research will really pay off:
Given your total useful knowledge is, by definition, the cumulative sum of still-relevant knowledge, your greatest (possibly only) proxy for useful signal should be the “test of time” estimator: how likely are you to still be caring about {a given model, paper, technique, library, or news item} in 1/3/12/60/120 months from now?
With that in mind, they came up with 5 high-level promising directions for making state-of-the-art LLMs, and drilled down into the details on where these might lead us: Longer inference effort on hard problems; synthetic data to improve training process; alternative architectures; mixture of experts; and online LLMs that integrate web search.
It’s a very good list but there is one more big trend for SOTA AI: Multi-modality. LLMs are getting superseded by LMMs, Large Multi-modal models, that can take in audio, video, images as well as text.
AI Business and Policy
“Accelerated computing and generative AI have hit the tipping point. Demand is surging worldwide across companies, industries and nations.” - Jensen Huang, CEO of Nvidia.
Nvidia earnings send stock rocketing as company cheers AI tipping point. Thanks to its leadership in AI chips, Nvidia had another blowout quarterly earnings that far exceeded expectations; total revenue was in particular, their AI-driven data center revenue was $18.4 billion, up 409% from a year ago. This sent Nvidia stock value soaring to make Nvidia worth $2 trillion post-earnings.
Not to be left out, this week AMD announced their upcoming HBM3e Enhanced MI300 AI Accelerator and will release Next-Gen MI400 in 2025. Meanwhile, Intel Launches World’s First Systems Foundry Designed for the AI Era.
With Nvidia now worth more than the GDP of Canada, is this excitement a ‘bubble’? While Nvidia stock is up more than four-fold in a year, so are its revenues. AI is having a '1995 moment' says one analyst, and we at “AI Changes Everything” agree.
AMD is predicting a $200 billion AI chip market by 2027, and this latest Nvidia report, showing more demand than supply for several more quarters, many Governments wanting to invest in AI to not get left behind. The AI hype is real, and we are in the early stages of the AI revolution. If is right, then the amount of compute power that will be
Nvidia, Intel, and Jeff Bezos invest millions in Figure AI, a humanoid robot company. These backers are investing $675 million in funding, valuing FigureAI at over $2 billion.
Amsterdam-based startup Monumental has secured $25 million in funding in its mission to transform the construction industry, led by robots laying bricks.
AI agent startup MagicDev has claimed to have made a technical breakthrough in “active reasoning” capabilities similar to Q* by OpenAI, according to The Information, via Rowan Cheung. “The company mysteriously raised $117M from former GitHub CEO Nat Friedman last week. Now we know why.”
OpenAI’s ChatGPT Went Berserk, Giving Nonsensical Responses late Tuesday and Wednesday morning. ChatGPT was spouting gibberish responses to user queries, and one user asked “Is my GPT having a stroke?”
Perhaps so. After fixing the problem on Wednesday, OpenAI in a post-mortem said, “An optimization to the user experience introduced a bug with how the model process language.” It sounds like an AI version of a stroke.
Mere chatbot errors won’t cost money, though, right? Think again. Air Canada must pay damages after chatbot lies to grieving passenger about discount.
North Korean hackers use ChatGPT to scam Linkedin users.
ChatGPT parent company OpenAI and investor Microsoft revealed last week that it had “disrupted five state-affiliated actors that sought to use AI services in support of malicious cyber activities.”
The AI revolution will displace search. Gartner Predicts Search Engine Volume Will Drop 25% by 2026, Due to AI Chatbots and Other Virtual Agents.
AI Opinions and Articles
“If anything, what we're seeing is that internal red-teaming and alignment is NOT the miracle safety solution to everything in AI (very limited, biased & easy to jailbreaks) Truth is the safest way to build AI is openly, transparently and iteratively with the community.”
As we covered in our article “Google's Image Problem - Diversity Gone Wild,” Google came under fire this week over their image generation re-prompting for ‘diversity’ creating absurd and historically inaccurate results. Google paused Gemini’s ability to generate human images completely until they fix the problem.
In a blog post on Friday, they admitted that “ Gemini image generation got it wrong. We'll do better.” Confirming this, Elon Musk tweeted:
“A senior exec at Google called and spoke to me for an hour last night. He assured me that they are taking immediate action to fix the racial and gender bias in Gemini.”
There have been many expressed opinions dissecting Google’s problem. Alberto Romano at The Algorithmic Bridge says to “look past the cultural divide” and consider specification gaming, a form of AI misalignment, where AI takes directions literally but lacks context or nuance to reflect true intent.
As Gary Marcus explains in “Google gone too woke? Why even the biggest models still struggle with guardrails” that AI misalignment problem won’t get fixed soon:
Getting AI to behave in a culturally-sensitive yet historically accurate way is hard for two reasons, one having to do with data and cultural change, the other with intelligence. …
Some things have scaled exponentially, and some haven’t. The capacity to endow machines with the commonsense understanding of history, culture, and human values that would be required to make sensible guardrails has not.
Finally, Nvidia CEO Jensen Huang says “Learn to code” is obsolete:
“It is our job to create computing technology such that nobody has to program, and
the programming language is human. Everyone in the world is now a programmer. This is the miracle of artificial intelligence.” - Nvidia CEO Jensen Huang
Lesson: Don’t learn to code, learn to use AI.