AI Week In Review 24.04.13
Udio makes AI music, Google Gemini 1.5 Pro for everyone, GPT-4 Turbo with Vision, Mixtral 8x22B open AI model, Grok 1.5 adds vision, Viking Model, Human Pin launched, Gaudi3 & new AI chips!
TL;DR - A huge week for AI model releases. Top billing goes to Udio AI music generation app - it’s next-level. Plus, Gemini 1.5 Pro for everyone, GPT-4 Turbo with Vision, Mixtral 8x22B open AI model, Grok 1.5 adds vision, Viking, Humane Pin launched, new AI chips, and more.
AI Tech and Product Releases
As we shared in our “New AI Models Drop” article, Google made many major AI release announcements at Google Cloud Next this week:
Google released Gemini 1.5 Pro worldwide, available in public preview on its Vertex AI platform.
Gemini Code Assist, Gemini in Databases, and Gemini Cloud Assist.
Google Vids, an “AI-powered video creation app for work that sits alongside Docs, Sheets and Slides.”
The TPU v5p AI chip and AI Hypercomputer and other compute to run AI workloads in the Google cloud.
Vertex AI Agent Builder, a no-code builder for AI Agents that’s “an expansion and rebranding of Vertex AI Search and Conversation.”. This capability is more like custom GPTs than full AI Agent swarms, but it’s useful nonetheless for many workflow use-cases, e.g., voice-recognizing custom chatbots, etc.
Google made many other announcements tying their AI and cloud offerings together, putting Gemini-model-based AI applications across their tool suite.
Gemini 1.5 Pro’s huge context window opens up new opportunities in large corpus comprehension. For example, can Gemini 1.5 ingest all the Harry Potter books (1M words) and analyze them at once? DeedyDas tried it and successfully got Gemini 1.5 to chart all the characters, as shown below.
OpenAI released GPT-4 Turbo with Vision, an updated GPT-4 Turbo with improved vision and reasoning, and made it available both through its API and on ChatGPT (paid tier). This new model improves incrementally on several benchmarks, and it reclaimed the top spot on the LMSYS chatbot Arena leaderboard. See also “New AI Models Drop.”
Mistral dropped their 8x22B open AI model with a torrent link. This new release seems to perform on par with Claude 3 sonnet, and close to GPT-4, making it the highest-performing open AI model. Again, more details in “New AI Models Drop.”
X AI previewed Grok 1.5 with vision, claiming Grok 1.5 vision is remarkably good AI model for multi-modal real-world tasks:
Grok-1.5V is competitive with existing frontier multimodal models in a number of domains, ranging from multi-disciplinary reasoning to understanding documents, science diagrams, charts, screenshots, and photographs. We are particularly excited about Grok’s capabilities in understanding our physical world.
X AI created the RealWorldQA benchmark, similar to Meta’s EQA (mentioned in our research roundup for this week), that measures real-world understanding, and showed that Grok-1.5V performed better than other frontier AI models on that benchmark.
SILO AI Released New Viking Model Family, an open-source LLM supporting all Nordic languages (Danish, Finnish, Norwegian, Icelandic, and Swedish) alongside English and programming languages. It is available in 7B, 13B, and 33B.
HuggingFace Releases Parler-TTS, an inference and training library for high-quality, controllable Text-to-Speech (TTS) models.
Humane announced the availability if its Humane Pin this week, but it’s been met with some tough reviews. The Verge pans The Humane Pin as “not even close”:
For $699 and $24 a month, this wearable computer promises to free you from your smartphone. There’s only one problem: it just doesn’t work.
Reviewers’ main complaint is the Humane Pin is too unreliable, and it doesn’t do practical things better than alternatives, like your smartphone.
Top Tools & Hacks
Udio released an AI music generation app and shared many demos of their musical creations. Udio lets users create music tracks in many genres from text prompts and can have lyrics; users can discover and share created music as well. Many people have been blown away by how good it is: “The most realistic AI music tool I’ve ever tried.”
David Friedberg on All-In podcast said:
"I think that musicians, artists, and consumers are going to start using these tools in a really prolific way, given how good they are now."
What’s behind Udio’s next-level musicality and quality? Udio is a well-funded startup founded by former Google Deep Mind researchers who worked on the Google Lyria project, an AI music generation model announced late last year.
All this makes Udio the tool of the week. You can generate up to 1200 songs per month while it’s in free beta. Or you can ‘discover’ and listen to prior Udio-generated music like it’s a new Spotify list: Jazz is great, so is country, more country, classical baroque, or this unstable fugue, etc. It can even do stand-up comedy! Give it a try.
Want to try these great new open-source AI models that we talk about, like DBRX, Qwen or Mistral AI models? You can try them out at the Together.AI playground. For developers needing API access to open AI models, Together also offers APIs for 100+ AI models on their serverless endpoints platform.
AI Research News
We shared a packed many interesting AI research advances in our AI research roundup this week:
AutoCodeRover is a New SOTA Coding Agent, getting 22% on SWE-Bench-lite.
Zero-shot Requires Exponential Data. Scaling is log-linear.
Defining intelligence across human and artificial
MiniCPM: Small Language Models with Scalable Training Strategies
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Better Reasoning Using LLMs as Pseudocode Compilers
ChatGLM-Math Improves LLM Problem-Solving with Self-Critique
OpenEQA: Embodied Question Answering
But the most important result of the week came as a late entry, the paper “Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention.” In it, Google AI researchers shared an approach to extend transformer architecture to achieve unbounded context windows. Existing LLMs can be further pre-trained to obtain greater context window length.
Since this is a Google paper and Google’s Gemini leads in very long context windows, it’s plausible this technique is behind Google’s ultra-long context accomplishment.
AI Business and Policy
Several AI chip announcements this week, besides the already-mentioned Google TPU v5p, with many taking on Nvidia’s prized AI chip leadership:
Intel announced Gaudi 3 claiming “50% on average better inference and 40% on average better power efficiency than Nvidia H100.” That’s NVidia’s last generation, but Intel is also promoting an open ecosystem and lower cost-of-ownership, i.e., they’re cheaper.
Meta announced their next generation AI accelerator chip, the custom-designed Meta Training and Inference Accelerator (MTIA). our family of custom-made chips designed for Meta’s AI workloads.
Bloomberg reports Apple will upgrade their lineup of Macs with an upcoming AI-focused M4 chip family, with initial releases in late 2024 and extending into early 2025.
The Information reported OpenAI has terminated researchers Leopold Aschenbrenner and Pavel Izmailov for allegedly leaking information outside of the company. Aschenbrenner was part of OpenAI's "superalignment" team, and Izmailov also spent time on the AI safety team.
Goldman Sachs Asset Management is exiting Big Tech investments, seeing the Mag 7 (Microsoft, Meta, Nvidia, Alphabet, Amazon, Apple, and Tesla) era as over. “The firm believes tech shares will come under pressure and prefers areas like energy and Japanese shares.” They are also concerned that the AI hype bubble may be growing. John Naughton in the Guardian is calling AI a bubble.
The latest AI-enabled scam is fake obituary websites. It doesn’t require AI to build fake websites, but it makes it cheaper and easier.
The UK's competition watchdog, the CMA, has expressed "real concerns" over Big Tech's grip on the advanced AI market. There are many companies now competing on AI models and AI applications, there is a healthy amount of real competition. However, it’s mostly US-based Big Tech companies, and the CMA sees it as an "interconnected web" of AI partnerships between Google, Apple, Microsoft, Meta, Amazon, and chip-maker Nvidia.
The European Union and United States have put out a joint statement affirming a desire to boost AI safety and risk research.
The AI Revolution Is Crushing Thousands of Languages, says Matteo Wong. Despite efforts on low-resource languages, AI’s mother tongue is English:
English is the internet’s primary tongue—a fact that may have unexpected consequences as generative AI becomes central to daily life.
AI Opinions and Articles
Arvind Narayanan on X comments on Humane AI Pin mis-fires as an example of “the underappreciated capability-reliability distinction in gen AI.” He says research and product development in AI should focus on reliability. Product misses result from misplaced priority of capability over reliability.
I agree. More reliability and predictability is needed in AI applications. Unreliable stochastic behavior may be acceptable for ‘just playing’ with Gen AI, but doesn’t cut it for business workflows, daily productivity, or critical applications like healthcare.
If AI could reliably do all the things it's capable of, it would truly be a sweeping economic transformation. - Arvind Narayanan
How to fix AI reliability issues? Mostly Harmless Ideas’ author explains Why reliable AI requires a paradigm shift. Current AI systems can produce hallucinations and are not reliable because they are based on statistical methods that can lead to biased and inaccurate results. “The statistical language modeling paradigm, at its core, is a hallucination machine.”
However, one critical challenge of language model hallucinations is “the difficulty in effectively communicating the limitations of these systems to end-users.”
We can mitigate unwanted hallucinations by grounding models in authoritative information via external knowledge bases. He proposes “robust model architectures and training paradigms less susceptible to hallucinations … incorporating explicit reasoning capabilities.”
Reliability is a fundamental challenge to AI based on generative AI models. It will take focus and effort to get that addressed.
A Look Back …
The rise of generative AI - A timeline of triumphs, hiccups and hype is a timeline of the first year of the Generative AI era, from ChatGPT onwards. AI’s early days are passing from ‘now’ into history, but we are still in the early days of the AI era. There’s a lot more progress, invention and impact yet to come.