AI Week In Review 24.05.25
Microsoft Build puts Copilot everywhere, Phi-3, Phi Silica, Copilot + PC, Copilot Studio, Github Copilot extensions, Cohere Aya 23, Smaug Llama 3 70B, Google's AI Overviews goes rogue, BiomedParse.
AI Tech and Product Releases
Microsoft announced a slew of new AI products and features at their Build ‘24, as we covered in “Microsoft Builds the AI Platform Shift.” It’s all about putting AI across their product lines, branding it Copilot. New releases and updates include:
The new Copilot + PC, implemented as their new Surface, with Recall, a feature that uses AI to help you find things on your PC
New Phi-3 models, including multimodal Phi-3 vision and 14B Phi-3 medium.
Phi Silica, a small LLM designed to run on Neural Processing Units (NPUs).
Copilot for Windows gets access to GPT-4o and will become an app.
GitHub Copilot gets extensions; Copilot Studio will support customized Copilots; AI developers get a better Azure AI Studio to build AI apps with.
Microsoft also partnered with Khan Academy to offer Khanmigo, an AI teaching assistant, for free to all US K-12 teachers.
Cohere released Aya 23, a family of multi-lingual language models with open weights, supporting 23 different languages. The models come in 8B and 35B sizes. Cohere touts Aya 23 for bringing “state-of-the-art language modeling capabilities to nearly 1/2 the world's population.”
Abacus AI released Smaug-Llama-3-70B-Instruct, an open-source conversational AI model that rivals GPT-4 Turbo on MT-Bench (9.2) and is currently the top open source model on Arena-Hard.
Leonardo AI has released Image Gen V2 and made Character Reference generally available, claiming Image Gen V2 has a new, simpler interface with a cleaner UI.
Amazon’s is giving Alexa a big AI upgrade, and it could require a new subscription. Amazon will launch its “more conversational” version of Alexa later this year, but is already testing elements of the upgraded Alexa in preview.
Google’s new AI Overviews rolled out in the past week, but the new feature has been producing hallucinations and bad results and is causing a huge backlash online. Google keeps managing to shoot itself in the foot, suggesting eating rocks in a diet, not finding African nations starting with “K” (it’s Kenya), touting health benefits of running with scissors, etc.
Google’s response is to both minimize it and say they are ‘taking swift action’ to fix it:
“Many of the examples we’ve seen have been uncommon queries, and we’ve also seen examples that were doctored or that we couldn’t reproduce.”
Peter Yang on X identifies Google’s use of Reddit as one source of the confusion. It seems that Gemini lacks a sarcasm or humor detector for some of its information. Maybe, we shouldn’t define ‘intelligence’ based on Reddit comments?
A contrarian take from Greg Speicher is that Google will benefit and improve this feature, and maintain their moat in search.
“The optics of the hallucinations in AI overviews are negative and have made headlines in the tech and investor worlds but this is almost totally invisible to normal users. This capture of long tail data has always been a Google strength. Now they are generating massive AI overviews that can be continually refined and improved at a massive scale that others cannot match. This data advantage has kept Google at the forefront of search and may continue to reinforce the moat.” - Greg Speicher
Top Tools & Hacks
Google’s recently released Gemini 1.5 Flash is a solid performing model, that is also fast and cheap, as we shared in our “Gemini 1.5 Pro & Flash Revealed” article this week.
From Microsoft, another most interesting new AI models to try is Phi-3 medium, which is small enough to download and run locally (using Ollama). Head to head, here’s how these two smaller-but-still-powerful models stack up:
Gemini 1.5 Flash: MMLU 78.9%, Human Eval 74.3%, GSM8K 86% (11-shot)
Phi-3 medium: MMLU 76.6%, Human Eval 58.5%, GSM8K 87.5% (8-shot CoT)
AI Research News
Our AI research paper highlights for this week had a focus on advances in understanding the internals of how AI models work:
Mapping the Mind of a Large Language Model
Your Transformer is Secretly Linear
Not All Language Model Features Are Linear
Financial Statement Analysis with Large Language Models
UniMoE: Unified Multimodal Framework
In addition to the above, Microsoft introduced a biomedical foundation AI model called BiomedParse. BiomedParse simplifies biomedical image analysis by accurately performing segmentation, detection, and recognition across various imaging types.
AI Business and Policy
OpenAI formed a news partnership with News Corp, the global publisher of Wall Street Journal and many other newspapers. “News Corp and OpenAI to Partner in Supporting the Highest Journalistic Standards.” This multi-year partnership grants OpenAI access to News Corp's extensive news content, similar to deals they’ve penned with other news companies.
Scarlett Johansson is considering legal action against OpenAI for using an AI voice similar to hers in the GPT-4o demo. We now know that: OpenAI asked Scarlett Johansson to be a voice for their chatbot, but she declined; OpenAI found and used another voice actress for “Sky” in GPT-4, then touted the connection to the movie “Her,” which used Johansson’s voice. Johansson released a statement noting:
Mr. Altman even insinuated that the similarity was intentional, tweeting a single word "her" - a reference to the film in which | voiced a chat system, Samantha, who forms an intimate relationship with a human.
Yes, it sounds a lot like “Her” and OpenAI deliberately drew the “Her” connection to amp up GPT-4o hype. But OpenAI found and hired other voices, and the real voice of “Sky” wasn’t asked to sound like Scarlett Johansson. OpenAI has paused use of Sky, but this other voice actress (who remains anonymous) shouldn’t lose her own ability to sell her voice because she sounds like “Her.”
OpenAI has disbanded its internal AI safety research team, in the wake of the departure of key Super-alignment team leaders, OpenAI co-founder Ilya Sutskever and Jan Leike.
Microsoft and G42, the UAE’s top AI company, announced a $1 billion data center project in Kenya, powered by geothermal energy.
Meta created an AI advisory council to guide the company on its AI and technology advancements. Meta picked four top tech CEOs and executives - Stripe CEO and co-founder Patrick Collison, former GitHub CEO Nat Friedman, Shopify CEO Tobi Lütke, and former Microsoft executive Charlie Songhurst.
There’s been some criticism that all four are males, but it’s not like these Unicorn-making high-tech CEOs and executives are just any males; they are industry leaders already navigating the AI revolution, uniquely qualified to be giving advice on this due to their other roles.
Humane AI is reportedly seeking a buyer after the launch of its AI Pin wearable device.
Scale AI raised $1 billion, valuing the company at $13.8 billion, in a deal with backing from NVidia, Amazon, and Meta.
Legal startup Harvey announced a partnership with Mistral.
The UK government launched an £8.5 million research funding program to enhance AI safety, “to break new grounds in AI safety testing.”
The California Senate Passed SB 1047, a bill regulating AI to promote AI safety, that some are saying will undermine AI innovation.
AI Opinions and Articles
Fei-Fei Li and John Etchemendy tell us No, Today’s AI Isn’t Sentient. Here’s How We Know. They tell us, correctly, that we have not achieved sentient AI and “larger language models won’t get us there.” As they explain, AI lacks the actual subjective real experience required for sentience:
All sensations—hunger, feeling pain, seeing red, falling in love—are the result of physiological states that an LLM simply doesn’t have. Consequently we know that an LLM cannot have subjective experiences of those states. In other words, it cannot be sentient.
At the same time they argue that we will need sentience to get to AGI. This is incorrect in my view. Sentience is a long ways off, but it’s not required for high levels of useful functional intelligence and more advanced AI.
Stanford HAI Released the latest version of its Foundation Model Transparency Index, which evaluates the transparency of 14 major AI developers. Main takeaway: Generative AI model transparency is improving, but there’s a long way to go.
Who are the good players and not-so-good players when it comes to AI transparency? Here’s the latest marks for the main AI model developers.