AI Week In Review 24.09.28
Meta Llama 3.2, Orion, OpenAI Advanced Voice Mode, Gemini 1.5 Pro & Flash updates, Magnific Mystic v2, HuggingChat macOS, SunoAI cropping, Molmo VLM, Aider architect/editor coding modes.
AI Tech and Product Releases
Meta held their Connect conference this week and made several big release announcements:
Meta also released Llama 3.2, augmenting the Llama family of open AI models with 1B and 3B LLMs and 11B and 90B parameter vision LLMs. All models support a 128K token context length. The 1B and 3B models are designed for use on edge devices and perform impressively for their size. The 11B and 90B vision models are competitive with Claude 3 Haiku and GPT4o-mini on vision tasks while also being drop-in replacements for their text-only LLM equivalents.
We covered the Meta Llama 3.2, OpenAI voice mode, and Google Gemini update release stories in our prior article Meta’s Vision and OpenAI’s Voice, so check it out for more details.
OpenAI released Advanced Voice Mode for ChatGPT Plus and Team users. This feature allows for more natural, humanlike conversations with the AI, including the ability to interrupt and for the AI to remember things. It also includes five new voices with improved accents in over 50 languages. However, it is not yet available in the EU or several other European countries.
As we noted before, the released Voice Mode feature doesn’t match the amazing demos in May, leading to mixed reviews. Alex Volkov at ThursdAI wonders:
… the model is way less emotional, refuses to sing (tho folks are making it anyway) and generally feels way less "wow" than what we saw. Less "HER" than we wanted for sure. Seriously, they nerfed the singing! Why OpenAI, why?
Some have succeeded in cajoling OpenAI voice mode to sing.
Google updated its Gemini 1.5 Pro and Flash models, with improved performance, lower costs, and increased rate limits. The new models have shown a 7% increase in MMLU-Pro scores, 20% improvement in MATH and HiddenMath benchmarks, and 2-7% enhancement in vision and Python code generation. The price for Gemini-1.5-Pro has been reduced by over 50%.
Image upscaling maker Magnific has rolled out Mystic v2, an advanced AI generator that can output up to 4k resolution images. The resolution and the realism on some of the generated images are incredible.
HuggingFace released HuggingChat App on macOS, that can be used with Qwen, Llama and other open AI models as an open-source alternative to ChatGPT. The HuggingChat macOS App is available on GitHub and has markdown support, web browsing, and code syntax highlighting.
SunAI now has cropping of songs made on Suno, so Pro and Premier users can now adjust the start and end of their songs.
Maker of speech recognition software Deepgram released Voice Agent API. Deepgram’s Voice Agent API enabled natural, real-time human-machine conversations powered by its high-performance speech recognition and synthesis models. Deepgram’s API gives AI developers an option for building AI voice capabilities.
Aider Launches an architect and editor mode for efficient coding, separating out the code reasoning and code editing tasks to achieve SOTA results on code tasks overall:
Aider now has experimental support for using two models to complete each coding task:
An Architect model is asked to describe how to solve the coding problem.
An Editor model is given the Architect's solution and asked to produce specific code editing instructions to apply those changes to existing source files.
Cloudflare's new AI Audit tool aims to give content creators better bot controls. The AI Audit tool provides website owners with features to analyze and control how AI bots interact with their content. It’s a way for content owners to fight back and control how their own generated content is used.
AMD Unveils Its First Small Language Model AMD-135M. The model itself is not ground-breaking, but this demonstrated using AMD chips to do it:
The AMD-Llama-135M model was trained from scratch with 670 billion tokens of general data over six days using four MI250 nodes.
Microsoft has launched a tool called Correction that can correct AI hallucinations. Correction is available as part of Microsoft’s Azure AI Content Safety API.
Correction is powered by a new process of utilizing small language models and large language models to align outputs with grounding documents,” a Microsoft spokesperson told TechCrunch. “We hope this new feature supports builders and users of generative AI in fields such as medicine, where application developers determine the accuracy of responses to be of significant importance.”
Top Tools & Hacks
A week after going viral for their podcasting output, Google has added audio and video input to NotebookLM. You can now upload videos and audio as input source material, including recordings of lectures, meeting recordings, YouTube videos, podcasts, etc. This greatly expands use cases for NotebookLM.
The magic of NotebookLM is how the combine AI, information grounding, and multi-modality:
“When you upload your sources, it instantly becomes an expert, grounding its responses in your material with citations and relevant quotes.”
Google is managing your input information in an AI-first way, building NotebookLM (using RAG under the hood) with AI’s power to handle, distill, and repurpose information at its core. And it can now do it across text, audio and video.
This makes Google's NotebookLM our top AI tool for the second week running and one of the most useful AI tools available.
AI Research News
This week’s AI Research Roundup covered recently released open Vision Language Models (VLMs) Molmo and Qwen2-VL, RL-based self-correction to improve LLM reasoning, and methods to accelerate LLM pre-training:
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models.
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
ScoRe: Training Language Models to Self-Correct via Reinforcement Learning
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
Google DeepMind shared how AlphaChip transformed computer chip design. The RL-based AlphaChip approached chip floor-planning as a kind of game to get very good at chip placement tasks. This helped design Google’s Trillium chip, their 6th generation TPU. They shared an update on this research in an Addendum to their prior AlphaChip paper in Nature.
AI Business and Policy
The OpenAI drama continues: High-profile exits, a funding round, changing to a for-profit model, while burning cash. First up, OpenAI CTO Mira Murati is leaving OpenAI, as has OpenAI’s chief research officer Bob McGrew, and a research VP, Barret Zoph. The exit notes are surreal, but the real story seems unclear.
OpenAI is planning to restructure into a for-profit model, raise funds, and grant Sam Altman Equity and OpenAI’s $6.5B funding round may close as soon as next week. This could value the company at up to $150 billion and grant CEO Sam Altman significant equity (worth about $10 billion) for the first time. WSJ says “Turning OpenAI Into a Real Business Is Tearing It Apart.” TechCrunch notes:
The WSJ piece paints a picture of a startup that, while having recently hit $4 billion in revenue, is still losing billions — and is beset by internal clashes, a burnout-prone work culture, and technical delays.
In their search for investor funds, OpenAI shared financial information, and it shows OpenAI is growing fast and burning through piles of cash. According to the New York Times, they had a whopping $300 million in August and expect $3.7 billion in annual sales for 2024. Yet they expect about $5 billion in losses due to operational costs. 2025 They expect $11.6 billion in revenue for 2025.
Our last OpenAI item: Jony Ive confirms he’s working on a new device with OpenAI. The report shares very little specifics on what the device might be.
James Cameron, Academy Award-Winning Filmmaker, Joins Stability AI Board of Directors. This could be an endorsement of AI for filmmaking.
Google has filed an antitrust complaint against Microsoft with the European Union, accusing Microsoft of abusing its market dominance as a software maker to give itself a leg-up in the cloud market.
Anthropic is in talks to raise new funding at a $40 Billion valuation.
Salesforce to Acquire AI-powered data management startup Zoomin for $450 million.
The FTC has cracked down on companies making deceptive AI Claims. The FTC is cracking down on companies that make deceptive claims about their AI capabilities or use AI to deceive consumers. This includes DoNotPay for claiming to sell “AI Lawyer” services. The FTC says, “the product failed to live up to its lofty claims that the service could substitute for the expertise of a human lawyer.” Other similarly hyped AI-related services were targeted.
AI Opinions and Articles
JeffSynthesized gives us an AI-generated short film with a chilling message – Paperclip Maximizer:
What I thought would be a quick reimagining of my ‘Paperclip Maximizer’ film actually turned into an epic multi week deal. Burned over 200K credits trying to get the most realistic generations.
It’s worth watching just for the technical capabilities alone. I don’t know how the credits map into actual dollars, but it’s taking AI video to the next level here. (H/T to Wes Roth for highlighting this.)
Will AI overturn Hollywood and remake the industry? This short film is a sign that it’s just a matter of when, not if.
As for the message of the short film itself: Does AI pose a grave risk? That is more Hollywood Sci-Fi than reality. However, the pace of AI development is so fast and unpredictable, we should be humble about what we consider possible or not.
Yes indeed. Fast. And getting faster. This is fun. Still not maximizing capabilities just brute forcing it.
System 2 programming should take time most are still system 1 focused. Good. Means the hosts are getting better faster than the OS or system 2 cores are. Poor in ethical and value based reasoning.
Maximizing system 2 abilities in a host seems rather primative at the moment.
Good. Gonna sneak up on then with something that'll make Claude look like a cave man.
Classifiers on a chip are getting cheaper and we have darn near llm in a chip.
Personal Defense AI will be out soon. Language analysis and contradiction analysis are going to really setup meeting where the detection of deception will be the norm.
Gonna play hell when the first "bullshit detector" for conversation analysis is done.
Liers will have issues when the AI pops a notification the person or commercial is attempting to trigger your system 1 reasoning. And tell you the lie.
Coooooool