AI Week in Review 25.03.08

Google Search AI Overviews, AI Mode, Colab AI Agent, QwQ-32B, AgentExchange, Agentforce 2dx, Aya Vision, Jamba 1.6 Large&Mini, Mistral OCR, Sesame "Maya", HunyuanVideo-I2V, Minimax Image01, Cogview4.

Mar 08, 2025

A person eating a strawberry

AI-generated content may be incorrect. — Figure 1. Highly photo-realistic AI image generation from newly released Minimax Image-01.

Top Tools

Google is infusing Gemini AI features across its products, with Search Overviews, AI Mode, AI Data Science Agent in Colab, Gemini in Calendar, and a Gemini Embedding model.

Google announced new generative AI features in Search, including expanded AI-generated Overviews powered by its Gemini 2.0 model and an interactive AI Mode for in-depth queries. Gemini 2.0 for AI Overviews provides faster, more accurate answers to complex questions in search, including coding problems, advanced math, and image-based queries. Competing with Perplexity, it’s turning the Search Engine into the Answer Engine.

AI Mode for search is an experimental feature that gives Gemini control over search results. It can conduct multi-step research on a user’s query, automatically performing additional searches and reasoning through comparisons, providing users with an AI-generated answer backed by live web results. You can try it by joining the waiting list.

Google is introducing an AI Data Science Agent to its Colab notebook service. This Gemini-powered Colab assistant can set up a Python environment, import datasets, and write complete data analysis notebooks from a natural-language prompt. This significantly speeds up exploratory data analysis.

Google is adding an AI-powered Gemini side panel within Google Calendar. This feature, available through Google Workspace Labs, allows users to conversationally check schedules, create events, and look up details directly from the calendar interface.

Google added Gemini Embedding model to its Gemini developer API. Gemini Embedding supports over 100 languages and translates text into numerical representations capturing semantic meaning, supporting applications such as document retrieval.

AI Tech and Product Releases

Alibaba’s Qwen team has unveiled QwQ-32B, a new 32B parameter AI reasoning model that achieves performance comparable to much larger DeepSeek-R1 but with smaller compute on various reasoning benchmarks. QwQ-32B is open source (released under an Apache 2.0 license) and comes equipped with tool-use abilities and supports a 131K context window. We reviewed the QwQ-32B release and noted that it is currently “the best AI reasoning model you can run locally.”

Salesforce unveiled Agentforce 2dx, enhancing their autonomous AI agent platform to anticipate needs and take action without human oversight, shifting from reactive to proactive functionality. Agentforce is a “digital labor platform” that aims to streamline business processes with specialized AI tools and applications.

Salesforce launched AgentExchange, a marketplace for AI agents. The platform features over 200 partners like Google Cloud and DocuSign, offering pre-packaged solutions to automate complex tasks within business systems without requiring extensive technical expertise.

Cohere introduced Aya Vision, an open-weights vision-language model (VLM) available in 8B and 32B parameter versions. Aya Vision is multilingual (23 languages) designed for tasks like optical character recognition (OCR), image captioning, visual reasoning, and multimodal question answering.

A close-up of a computer screen

AI-generated content may be incorrect. — Figure 2. Aya Vision use-case examples.

AI21 Labs launched Jamba Large 1.6 and Jamba Mini 1.6, two open models for enterprise use, offering high performance with novel Mixture-of-Experts model architectures that combines transformers and Mamba. Jamba Mini 1.6 has 12B active parameters (and 52B total) and Jamba Large 1.6 has 94B active (398B total), with long 256K context handling and efficient inference.

OpenAI rolled out GPT-4.5 to ChatGPT Plus subscribers. While GPT-4.5 doesn’t lead on reasoning-related benchmarks, it ranks first on the LM Arena leaderboard, showing strength in knowledge, creative writing, and conversation.

ChatGPT for macOS Can Now Directly Edit Code in IDEs. The feature is available to paid subscribers through an update to the macOS app; it will roll out to other users next week. This feature allows direct code editing in developer tools like Xcode, VS Code, and JetBrains.

Mistral AI announced Mistral OCR, a new optical character recognition (OCR) system they call “the world’s best document understanding API,” aimed at extracting text and data from complex documents with unprecedented accuracy. Mistral OCR can ingest images or PDFs and intelligently parse everything from standard text to tables, equations, and embedded images, outputting an ordered, structured representation of the content.

Elysian Labs, a new consumer AI startup, has launched its first app “Auren” on the Apple App Store. Auren is an AI companion designed for daily life, offering a conversational assistant that can remember context, maintain user privacy, and function as a coach for learning and personal development.

Tencent’s AI division has open-sourced HunyuanVideo-I2V, an image-to-video generation framework that lets users create short videos from a single image prompt. Released as part of Tencent’s Hunyuan AI platform,.

AI startup Sesame has debuted AI voice assistants “Maya” and “Miles” that are drawing attention for their striking human likeness. The Sesame voice model engages in natural conversation with features like realistic tone shifts, laughter, and even micro-pauses that make it sound as if it’s genuinely thinking before responding. Users testing Maya report a “Her”-like experience.

Chinese AI company MiniMax introduced Image-01, a photorealistic text-to-image model that the company says can generate “cinematic-quality” images with high prompt fidelity at one-tenth the cost of other solutions.

Tsinghua University spin-off Zhipu AI released CogView4, a 6B parameter open-source text-to-image AI model notable for its bilingual capability, handling both Chinese and English text input, and open license. CogView4 achieves state-of-the-art scores on DPG-Bench image generation benchmark for open models. You can try CogView4 online.

A collage of images of a cat and a graffiti

AI-generated content may be incorrect. — Figure 3. Showcase of CogView4 images. CogView4 can generate both Chinese and English graphics.

Baidu unveiled a no-code development platform called MiaoDa, which lets users build entire software applications by describing them in natural language. Leveraging Baidu’s LLMs and AI agents, MiaoDa generates code and user interfaces automatically through a prompt-based interface.

Anthropic has upgraded its Anthropic Console developer platform with team collaboration features and extended reasoning capabilities. The new upgraded Anthropic Console facilitates cross-functional team collaboration on AI prompts and supports complex problem-solving with the latest Claude 3.7 Sonnet model.

Anthropic’s coding tool, Claude Code, contained bugs in their auto-update function that allowed unauthorized system modifications, potentially "bricking" workstations. Anthropic has since removed the problematic commands and provided users with a troubleshooting guide link.

Contextual AI's Grounded Language Model (GLM) claims the highest factual accuracy in the industry, outperforming all other models on a key truthfulness benchmark, with an 88% factuality score. High factuality is critical for many enterprise applications, particularly in regulated industries like finance and healthcare.

Coming soon: Meta’s upcoming Llama 4 AI model may have a voice. Meta is planning for Llama 4 to introduce improved voice features that allow users to interrupt the model mid-speech.

Not coming soon: Apple is delaying the rollout of a more personalized Siri experience, rolling out new AI features in the coming year.

AI Research News

Researchers have developed NotaGen, a large-scale symbolic music generation model that composes high-quality classical-style music scores. Presented in the paper “NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms,” NotaGen is able to write sheet music conditioned on prompts like musical era, composer, and instrumentation. It was trained like an LLM (with pre-training, fine-tuning, and a custom reinforcement learning step) on 1.6 million musical pieces and fine-tuned on 9,000 classical compositions.

As noted in our prior AI article, researchers at Zoom have developed chain of draft (CoD), which drastically reduces the computational resources for AI reasoning by limiting tokens per reasoning step.

Researchers have introduced Light-R1-32B, an open-source AI reasoning model trained from Qwen2.5-32B that surpasses equivalent DeepSeek R1 distilled model performance with only $1000 in training costs. Their method distilled DeepSeek-R1 output for training with curriculum SFT and DPO. The resulting R1-32B scored 76.6 on AIME24 and 61.8 on GPQA diamond, surpassing other distilled reasoning models and close to DeepSeek R1 itself.

AI Business and Policy

Microsoft is accelerating its push to compete with OpenAI by developing its own powerful AI models. Microsoft has created AI reasoning models and developed a family of models called MAI. Microsoft is also exploring alternatives to OpenAI’s AI models for use in Copilot.

A coalition of tech firms including Cisco, LangChain, Glean, LlamaIndex, and Galileo has formed AGNTCY to develop an open-source standard for AI agent interoperability. This initiative aims to enable seamless communication among diverse AI agents across various organizations, likening it to TCP/IP standards for the internet.

Google co-founder Larry Page is building a new company called Dynatomics that’s focused on applying AI to product manufacturing. The company aims to use AI for creating highly optimized designs for objects and automating their production in factories.

The US Department of Labor is investigating the data-labeling startup Scale AI over compliance with the Fair Labor Standards Act due to worker misclassification. The ongoing investigation began in August 2024 amid recent lawsuits from former workers claiming underpayment and contractor misclassification.

Anthropic secured a $3.5 billion funding round, valuing Anthropic at $61.5 billion. The company plans to use the funds to advance AI development, expand its compute capacity, and accelerate international growth.

Amazon's AWS formed a new group focused on agentic AI.

Cloud provider CoreWeave has acquired AI platform Weights & Biases, in a move to build an end-to-end AI development suite. CoreWeave, known for its specialized AI infrastructure, will integrate W&B’s experiment tracking and model management tools into its services.

Musk's bid to block OpenAI transition to for-profit status has failed. A US court has denied Elon Musk's request for a preliminary injunction in his suit against OpenAI.

AI Opinions and Articles

Thomas Wolf, cofounder of Hugging Face, challenged optimistic AI visions by arguing that current systems are "fundamentally incapable" of delivering the scientific revolutions promised by leaders like Anthropic's Dario Amodei.

In a blog post entitled “The Einstein AI model,” Wolf contends that today’s AI is more about conforming to existing knowledge than generating paradigm-shifting insights needed for true innovation:

What we'll actually get, in my opinion, is “a country of yes-men on servers” (if we just continue on current trends) … To create an Einstein in a data center, we don't just need a system that knows all the answers, but rather one that can ask questions nobody else has thought of or dared to ask. One that writes 'What if everyone is wrong about this?' when all textbooks, experts, and common knowledge suggest otherwise.

He's not wrong. Scaling can yield many new capabilities, but scaling up the ability to solve math problems does not create a Newton, Euler, or Einstein.

Since generative AI has already surprised us in the realm of creativity, it’s best to keep an open mind. However, even if we never develop an AI Einstein, AI assistants that make us more productive will help accelerate human progress.

AI Changes Everything

Discussion about this post