AI Week In Review 24.07.27

Meta Llama 3.1, Meta AI assistant comes to Quest headsets, Mistral Large 2, SearchGPT, GPT-4o mini fine-tuning, Bing Generative Search, Luma Labs Loops, Udio 1.5 Stable Video 4D, AI at the Olympics.

Jul 28, 2024

Figure 1. Still from AI video - SD3 on Remix to Leonardo and Luma Labs video, with Udio 1.5 music, from Everett World.

AI Tech and Product Releases

The top news of the week was Meta’s release of Llama 3.1, releasing the biggest and best open AI model yet – the 405B model – as well as updates to the 7B and 70B models. We covered full details in a prior article, here are summary points of the Llama 3.1 release:

Llama 3.1 405B performance is SOTA, on par with GPT-4o and Claude 3.5 Sonnet.
Llama 3.1 70B and 7B sets a new SOTA for their sizes, with 70B scoring close to GPT-4.
Llama 3.1 has 128k context length, was trained for tool use, specifically to use a browser, execute Python code, and run Wolfram Alpha for complex math calculations.
Llama 3.1 is good at multi-lingual tasks, code generation, and complex reasoning.
Llama 3.1 is an open AI model with freely available weights, e.g. download from Meta or from HuggingFace, and their license allows fine-tuning and distillation into other models.
Meta is building and sharing more components of the AI ecosystem with their Llama releases, so that users can run Llama 3.1 models in an agentic AI system.

Meta is making a play not just in open AI models, but to be the open AI model most people choose to use. To that end, they have made Llama 3.1 widely available in chat interfaces and on cloud platforms.

Meta's AI assistant, is coming to Quest headsets in the US and Canada. This is the AI already on the RayBan smart glasses, and it will include the “Meta AI with vision” feature.

Mistral has released Mistral Large 2, a 123B model offering many great features in reasoning, code generation, multi-lingual understanding, function-calling and tool use. It achieves great performance for its size on code generation, math, and reasoning, on par with some of the best frontier AI models like GPT-4o and Claude 3.5 Sonnet. For example, Large 2 scores 90% on Human Eval, above every other model save for GPT-4o, and performs near or at SOTA on GSM8K, Math Instruct, MT Bench, Arena Hard.

Figure 2. Mistral Large 2 does very well on HumanEval and other coding and reasoning benchmarks.

Mistral took great effort to enhance the reasoning capabilities of Large 2, including using an approach that Llama 3.1 also took to improve factuality and reduce hallucinations: “fine-tuning the model to be more cautious and discerning in its responses, ensuring that it provides reliable and accurate outputs.”

Large 2, like most Mistral AI models, is also strong in multiple languages, supporting many European languages as well as major non-European languages, such as Arabic, Hindi, and Japanese.

Large 2 is released under a permissive research license where researchers and scientific nonprofits can access Large 2 at no cost, while for-profit companies will have to pay for it. Users can also access Large 2 via Mistral’s Le Chat chat interface, and developers can access it via multiple cloud providers.

OpenAI offers free GPT-4o Mini fine-tuning to counter Meta’s Llama 3.1 release. OpenAI announced that customizing GPT-4o mini with fine-tuning is available today for some user tiers, and they plan gradually expand access to all users. The first 2M training tokens a day are free, through Sept 23.

On the flip-side, Kyle Corbitt reports on X that “fine-tuned Llama 3.1 8B is completely cracked. Just ran it through our fine-tuning test suite and blows GPT-4o mini out of the water.”

OpenAI announced a prototype AI-powered search feature called SearchGPT and noted, “While this prototype is temporary, we plan to integrate the best of these features directly into ChatGPT in the future.” They are launching with a small group of users for feedback.

Microsoft is rolling out Bing generative search. In this update, Bing now has AI-powered summaries at the top of search results, with citations and related information, while traditional search results are on the right sidebar panel.

Bing is an “Answer Engine” similar to Google’s Generative Search and Perplexity’s interface, setting up a three-way race in AI-powered generative search. On the other hand, SearchGPT seems different; OpenAI’s SearchGPT is not competing directly with the answer engines, but instead it is directed towards eventually giving ChatGPT agentic search capabilities.

Figure 4. Bing Generative Search example, with answer, citations, related info, and traditional search results on the right side-bar.

Bing and DuckDuckGo search engines will no longer be able to surface new Reddit posts due to the site’s exclusive AI partnership with Google. Reddit is looking to monetize their user content.

Google gives free Gemini users access to its faster, lighter 1.5 Flash AI model.

Luma Labs released Loops, a video looping feature that lets you create a loop from a text or image input or extend a previous generation into loop.

Figure 5. Still from Luma Labs Loops demo - starship at warp speed.

Stability AI has released Stable Video 4D, an AI model for multi-angle video generation. Stable Video 4D transforms a single object video into multiple novel-view videos of eight different angles/views, a step towards full 3D video synthesis:

Currently, Stable Video 4D can generate 5-frame videos across the 8 views in about 40 seconds, with the entire 4D optimization taking approximately 20 to 25 minutes. Our team envisions future applications in game development, video editing, and virtual reality.

Udio this week released 1.5 and it’s a massive upgrade for the AI music generation app. Udio claims 1.5 has improved audio quality with more “clarity, instrument separation, transients, coherence, and musicality” on generated music. They added Stem controls for seperate tracks; audio-to-audio remixing; and a dedicated creation page.

AI Research News

Google Deepmind’s AlphaProof and AlphaGeometry 2 AI models made headlines by being the first AI to solve International Mathematical Olympiad problems at a silver medalist level.

Our AI research highlights article for this week covered that story as well as these other recent research results:

TextGrad: Automatic “Differentiation” via Text
Open Artificial Knowledge
Truth is Universal: Robust Detection of Lies in LLMs
Internal Consistency and Self-Feedback in Large Language Models: A Survey
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

AI Business and Policy

Elon Musk says the Optimus robot will go on sale in 2026, and this report goes into where other humanoid robots stand.

Harvey Raises $100 million in a Series C led by Google Ventures . The maker of legal AI assistants says that the funds will “enable Harvey to continue building the most trusted AI platform for professional services in the world.”

AI is coming to the Olympics: NBC is introducing 10-minute AI-generated recaps for the 2024 Olympics, and they cloned Al Michaels' voice to narrate the Olympic highlights. Another AI innovation is AI-powered camera systems for frame-freeze slow-motion replays of athletes.

Google is official search AI partner of Team USA. NBCUniversal’s anchors will use Google Search’s AI overviews to answer questions about some sports during their Olympics coverage of the Olympics.

Senate passes the Defiance Act, a bill that gives deepfake victims right to sue over nonconsensual explicit images made of them.

Senate Democrats sent a letter to Sam Altman demanding OpenAI turn over safety data. They asked twelve questions; this was question nine:

“Will OpenAI commit to making its next foundation model available to U.S. Government agencies for pre-deployment testing, review, analysis, and assessment?”

AI Opinions and Articles

Andrej Karpathy on the Llama 3.1 release on X:

I like to say that it is still very early days, that we are back in the ~1980s of computing all over again, that LLMs are a next major computing paradigm, and Meta is clearly positioning itself to be the open ecosystem leader of it.
People will prompt and RAG the models.
People will finetune the models.
People will distill them into smaller expert models for narrow tasks and applications.
People will study, benchmark, optimize.

Mark Zuckerberg’s essay that “Open Source AI Is the Path Forward” is the most impressive and important statement on AI from a major Tech CEO: Impressive because he has a deep understanding of AI technology and business dynamics around it; important because he is aligning his beliefs with action to deliver the best open source AI models.

AI developers and users their AI models to include these features and capabilities:

Train, fine-tune, and distill our own models
Control our own destiny and not get locked into a closed vendor.
Protect our data.
Efficient and affordable models to run.
An ecosystem that’s going to be the standard long-term.

Meta wants these features as well - to control their destiny with AI, avoid vendor lock-in, and have affordable AI models to build AI services. Only open AI models can fully and easily meet all these requirements. Thus, Zuckerberg sees open source Llama as the path to Meta’s success with AI, and sees only advantage from being open.

I expect AI development will continue to be very competitive, which means that open sourcing any given model isn’t giving away a massive advantage over the next best models at that point in time. The path for Llama to become the industry standard is by being consistently competitive, efficient, and open generation after generation.

Zuckerberg’s stance and Meta’s strategy is shaping AI technology in a good way. As Karpathy said, they are looking to be the open ecosystem leaders.

I believe the Llama 3.1 release will be an inflection point in the industry where most developers begin to primarily use open source, and I expect that approach to only grow from here. I hope you’ll join us on this journey to bring the benefits of AI to everyone in the world. - Mark Zuckerberg

AI Changes Everything

Discussion about this post