AI Week In Review 24.09.21

Qwen2.5, Veo for YouTube Shorts, Project Amelia, iOS 18.1 beta , Copilot Wave 2, Mistral Small, Runway Gen3 vid-to-vid & API, Dream Machine API, Kling 1.5, Moshi open-sourced, Notebook LM podcasts!

Sep 21, 2024

Figure 1. From RunwayML’s demo of their AI video-to-video generation feature in Gen-3 alpha. It’s a whole new class of special effects capability for video makers.

AI Tech and Product Releases

Alibaba released the Qwen2.5 family of LLMs, a suite of 13 open-weight AI models that range from 1.5B to 72B parameters, and include specialized coding and math variants:

Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.

The Qwen2.5 family has strong multilingual capability and understands 29 languages, has a context window of 128K tokens, and can generate up to 8K tokens per prompt.

Trained on 18 trillion tokens, Qwen2.5 boasts significantly improved knowledge, coding, and math capabilities, and shows significant improvements in following instructions and generating structured outputs. The flagship Qwen2.5-72B surpasses all other open-weights AI models of its size and outperforms Llama 3.1-405B on many benchmarks. The smaller Qwen2.5 models all are SOTA for their size, and Qwen2.5-Coder 7B is possibly the best AI coding model you can run locally.

This is an exciting and important release from Alibaba because Qwen2.5 are extremely capable open-source AI models you can use locally. The models are available on HuggingFace or via Ollama.

Figure 2. Qwen2.5-coder is the latest SOTA small coding LLM.

YouTube is integrating Google's AI video generation model, Veo, into YouTube Shorts. This integration enhances YouTube's existing "Dream Screen" feature and lets creators generate high-quality backgrounds and six-second video clips using simple text prompts. At the Made on YouTube event, they also announced Inspiration Tab, a gen AI ‘brainstorming buddy’ for content creators in YouTube Studio, and an automatic dubbing tool to support more languages,

Amazon launched Project Amelia, an AI assistant designed to help sellers grow their businesses on its platform. The assistant can answer questions about sales figures and customer traffic, providing summaries and comparisons to previous periods, and later will help resolve issues and automate tasks.

Also at its Accelerate conference, Amazon launched an AI-powered video generator for advertisers, which creates short product showcase videos from a single image. The tool is in beta for select U.S. advertisers and offers a limited set of customizable features.

Apple Dropped the AI-Powered iOS 18.1 Beta alongside the new iPhone 16, bringing Apple Intelligence AI updates to iPhones. You can upgrade to the AI-enabled latest iOS if you have an iPhone 15 Pro or Pro Max or buy their latest iPhone 16. iPad and Mac versions arrive in October. The AI features for iOS 18 include enhanced conversational Siri, Genmoji image creation, new AI tools for writing assistance, email summarization, photo editing, and more.

Apple announced that its Apple Intelligence AI features will be available in more languages throughout 2024 and 2025. However, due to regulatory issues, the feature will not launch in the European Union or China, though Apple is currently in discussions with authorities in both regions.

Microsoft rolled out Wave 2 of Copilot AI, with new features for both enterprise and personal users:

Copilot Pages, a dynamic, persistent canvas designed for multiplayer AI collaboration.
Improving Copilot in the Microsoft 365 apps, with advanced data analysis in Microsoft Excel with Python, “Narrative builder” dynamic storytelling in PowerPoint, managing Outlook inbox, and more.
Copilot agents to automate and execute business workflows.

Mistral AI announced a free tier API and dramatic pricing reduction for its LLMs of up to 80% on their Le Platforme API, aiming to increase accessibility for developers and researchers. They also released an upgraded Mistral Small 22B and integrated vision capabilities into its chat platform.

Logan Kilpatrick shared a Gemini update on X:

We just shipped a series of changes which have significantly improved the Gemini 1.5 Flash latency (>3x reduction) and output tokens per second (>2x more).

Runway, DreamMachine & Kling all announced APIs for their AI text-to-video generation models. LumaLabs AI announced their API as the Dream Machine API.

Runway announced the Runway API, allowing developers to easily integrate the text-to-video Gen-3 Alpha Turbo into their apps and products. Runway also announced a video-to-video generation feature, where Gen-3 Alpha can now use an existing video as a guide to generate new outputs with text prompts.

Runway also inked a deal with Lionsgate, the Hollywood studio that made “John Wick” and “Twilight” movies, to train a custom video model on Lionsgate’s movie catalog.

Runway says it’s also considering ways to license models as templates for individual creators to build and train their own custom models.

Kling AI released their 1.5 model, which comes with longer (10 seconds) and higher-resolution (1080p HD) output and Motion Brush that directs motion in a video. The motion-brush capability is impressive.

Figure 3. Kling AI’s motion brush feature helps direct the motion in a generated video.

On the open-source side of video generation, the CogVideoX team at Qingying has released CogVideoX-5B-I2V, their latest and best video generation model. It’s available via Hugging Face.

Snap presented Fifth-Generation AR Spectacles with New AI Features. Snap's latest Spectacles are standalone AR glasses powered by the new Snap OS, with integrated AI capabilities. The glasses feature cameras, sensors, and multi-modal AI. Snap is pitching it to developers to build up their software ecosystem, which according to the article, “still feels pretty basic for a standalone device.”

Top Tools & Hacks

Google's NotebookLM can make AI Podcasts from your writing. It includes an “Audio Overview” feature that transforms your uploaded documents, PDFs, or pasted content into a two-way podcast-style discussion between two AI hosts. Some examples of Notebook LM podcast generation have gone viral:

Here I gave it the entire text of my book, it turned it into a podcast, a study guide, FAQ, timeline & quite accurate chat.

NotebookLM can generate realistic podcasts from any technical written content, which is incredibly useful to content creators and ‘on the go’ learners. You can try it free here.

Kyutai Labs just open-sourced Moshi, an on-device speech-to-speech AI assistant. Moshi consists of a 7B LLM called Helium and Mimi, a state-of-the-art streaming speech codec. Andi Marafioti notes, “This release enables so many open-source audio AI systems.”

AI Research News

Our AI Research Roundup for this week covered new research in LLM reasoning:

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Enhancing LLM Problem Solving with REAP: Reflection, Explicit Problem Deconstruction, and Advanced Prompting
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
On the Diagram of Thought

I said that this research “gives me confidence multiple LLMs with advanced reasoning powers are coming soon.” As if on cue, the team at Nous Research announced Forge, their MCTS-enabled inference-scaling AI model built on open-source AI foundation models (the Hermes 3 series). Sign up here.

Microsoft released GRIN-MoE, a tiny MoE with 6.6B active parameters that gets superior performance, especially in coding and mathematics, scoring 74.4 on HumanEval and 79.4 on MMLU. GRIN-MoE uses a 16 x 3.8B MoE architecture and routes to the top 2 of 16 experts.

As explained in their “GRIN: GRadient-INformed MoE” Technical Report, the GRIN-MoE model uses sparse gradient estimation for expert routing, achieving efficient training and surpassing traditional MoE models. Trained on 4 trillion tokens, it outperforms 14B dense AI models trained on the same dataset, as well as its predecessor Phi-3.5 MoE. It’s available on HuggingFace, and their project page is on GitHub.

Nvidia introduced the NVLM Open Frontier-Class Multimodal LLMs, and released a paper with it. NVLM 1.0 are 72B multimodal LLMs in three configurations. Evaluations showed that these models match leading top models multimodal LLMs in handling tasks that involve vision and language without compromising text-only performance.

AI Business and Policy

LinkedIn scraped user data for AI, opting users into generative AI model training without asking for consent. Users can now opt-out by toggling the setting in the Data Privacy tab, but LinkedIn used data before updating their terms of service – not user-friendly.

BlackRock and Microsoft unveiled a $30 billion AI infrastructure fund aimed at investing in AI infrastructure such as data centers and energy projects essential for powering advanced AI models.

OpenAI is revamping its safety team and security practices. OpenAI is changing its safety team into a new independent Board oversight committee that will have oversight on OpenAI safety processes. The committee can delay or even halt new model releases until safety issues are addressed.

Fathom, a meeting transcription and note-taking AI service, has raised $17 million in Series A funding, fueled by strong user growth and the company's focus on building a robust platform with its own AI models.

AI weather forecasting startup Brightband announced themselves and a $10 million funding round. Brightband started as a Public Benefit Corporation, and they plan to open-source models and data to foster collaboration and accelerate weather prediction technology. They see a bright and open-source future for AI-enabled weather forecasting.

Fal.ai, which hosts image and video-generating AI models, raises $23M from a16z and others. Fal.ai hosts Flux among its wide roster of supported generative AI models.

Indian filmmaker Ram Gopal Varma is using AI-generated music exclusively for his future projects, citing its efficiency and cost-effectiveness compared to human musicians. Despite concerns from other filmmakers like Christopher Nolan about AI's limitations, Varma sees AI as a powerful tool for the future of the Indian film industry.

California passed 8 new AI-related laws, including those requiring disclosure of AI-generated content and prohibiting the creation of AI-generated replicas of deceased performers without consent. Governor Newsom is considering signing 38 more AI-related bills, including the major AI regulation bill SB 1047.

AI Opinions and Articles

In defiance of Gov Newsom’s recently signed bill to combat deepfake election content, Elon Musk has been sharing parody videos on X, which may go against the new law. While “truth in labelling” for AI content is good, free speech is a fundamental principle that shouldn’t be abandoned just because there are new ways to express beliefs.

A Look Back …

OpenAI touts the great things people can do with o1, o1 can even be a good calculator, but it can be 30x more expensive than GPT-4o. However, the bigger picture is that in 18 months, we have gone from 1 GPT-4 level AI model to many GPT-4-level AI models, and now with o1 we are moving to the next level.

This graphic of the history of Elo Scores is a reminder of how far we’ve come.

AI Changes Everything

Discussion about this post