AI Week In Review 24.08.10

Qwen2-Math, Qwen2-Audio, Gemini 1.5 Flash price cuts, OpenAI Structured JSON output, Mistral AI Agents, Anthropic's Bug Bounty, AI robot plays ping-pong, NVidia scrapes video, Strawberries get hyped.

Aug 11, 2024

AI Tech and Product Releases

Qwen team has released Qwen2-Math Model. Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, and it consists of Qwen2-Math and Qwen2-Math-Instruct in 1.5B, 7B, and 72B parameter sizes. Qwen2-Math-72B outperforms all other AI models, including GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro, on various math benchmarks, scoring 84% on MATH. It’s also SOTA on more complex mathematical competition evaluations such as AIME 2024 and AMC 2023.

Figure 2. Qwen2-Math-72B-Instruct is SOTA on MATH.

Qwen2-Math was trained on math web text, books, exams, and codes, including synthetic data. To boost results, they used advanced techniques like rejection sampling and group relative policy optimization, and incorporated Chain-of-Thought into the instruction fine-tuning.

Qwen releases open-Source Qwen2-Audio so users can chat with their AI. Qwen2-Audio provides an open-source LLM that can understand audio and text inputs and generate text outputs. It is multilingual, understanding audio in more than 8 languages and dialects, including Chinese, English, Japanese and several European languages. The weights are on HuggingFace.

Figure 3. Qwen2-Audio uses an Audio Encoder front-end to a QwenLM to make Qwen2-Audio.

Google AI has significantly reduced pricing for Gemini 1.5 Flash, cutting prices over 70%, to 7.5c per million tokens input and 30c per million tokens output, for prompts under 128K tokens. Further, caching token context is a fraction of that price. Moreover, Gemini 1.5 Flash fine-tuning is available to all developers. Lastly, they have made their PDF understanding multi-modal:

The Gemini API and AI Studio now support PDF understanding through both text and vision. If your PDF includes graphs, images, or other non-text visual content, the model uses native multi-modal capabilities to process the PDF.

Putting it all together, Google offers an extremely cost-effective AI stack for use cases involving long context, repeated context use, rich document processing, customized AI models, and more.

OpenAI introduced Structured Outputs in the OpenAI API. While GPT-4 and GPT-4o already had JSON output, this new feature makes sure outputs exactly match JSON schemas defined by developers. This is important for agentic AI, function-calling, tool use, and other AI assistant use cases.

Mistral announced the introduction of Mistral AI agents and AI model customization, in a blog post titled “Build, Tweak, Repeat.” This allows users to customize Mistral models with “simpler, more efficient model customization” and build AI agents based on Mistral models or fine-tuned models for use on Le Chat. Their agents are less like full-blown AI agents and more like custom GPTs “that wraps models with additional context and instruction.”

Agents help you create custom behavior and workflows with a simple set of instructions and examples. With the advanced reasoning capabilities of Mistral Large 2, you can layer on increasingly complex workflows with multiple agents that are easy to share within your organization.

Anthropic announces AI Model Bug Bounty Program. Anthropic AI announced an expansion of their bug bounty program, focusing on finding universal jailbreaks in their next-generation safety system:

Anthropic announced Thursday that in partnership with HackerOne it will start testing an expansion of its invite-only bug bounty program to receive findings of successful universal jailbreak attacks.

They're offering rewards of up to $15,000 for discovery of novel vulnerabilities across various domains, including cybersecurity.

In non-release news, there has been a lot of hype, trolling, and speculation on X and other parts of social media around OpenAI’s Project Strawberry, thanks to Sam A tweeting about strawberries. An anonymous model on Lmsys has added to speculation. OpenAI's 'Strawberry' project is aimed at enhancing reasoning abilities in their AI models, a potentially significant advancement in AI. But nothing has been released yet from that project.

Figure 4. Nice garden, but you’re only as good as your last shipped release.

But so far, it amounted to no release and no real news. Now, some on X say to expect news on August 17th. But it won’t be GPT-5, which OpenAI said won’t even be here by Dev Day in November.

Coincidentally (or not), there’s an anonymous AI model on Lmsys that may or may

AI Research News

Our AI research highlights article for this week covered multi-modal models, benchmarks and datasets, including for medical AI applications:

MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Also published this week, Google DeepMind’s AI agent-powered robot achieves human-level performance in table tennis. Their research describes the first AI-powered robot to reach (amateur) human-level performance in table tennis (ping-pong), beating beginners and winning 55% of its matches with intermediate players.

Figure 5. DeepMind’s table tennis robot playing against a professional coach.

AI Business and Policy

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI:

Internal emails, Slack conversations and documents obtained by 404 Media show how Nvidia created a yet-to-be-released video foundational model.

It’s not surprising AI video models are scraping from YouTube videos, there’s been clear evidence such content is being fed into them, but many YouTubers are complaining about their content being used without their consent.

Elon Musk Revives Lawsuit Against OpenAI and its founders, Sam Altman and Greg Brockman. This updated on-again lawsuit accuses them of fraud, false advertising, and breach of contract:

This latest legal action comes two months after Musk withdrew a similar suit filed in state court. It outlines a complex narrative of alleged deception and breach of agreements.

OpenAI co-founder Schulman leaves for Anthropic, Brockman takes extended leave. Brockman’s sabbatical is personal break for him. On the other hand, Peter Deng and John Schulman both leaving OpenAI for Anthropic to work on AI alignment begs questions of OpenAI’s AI alignment efforts.

Hugging Face acquires XetHub to enhance its AI storage infrastructure. Thi Hugging Face's strategy to become a comprehensive platform for AI development and deployment, offering services from model training to inference.

JPMorgan is going all-in on a new AI Assistant. JPMorgan Chase has introduced an AI tool called LLM Suite to over 60,000 employees to assist with tasks like writing emails and reports. It’s based on OpenAI’s ChatGPT.

“employees are urged to see LLM Suite as a research analyst who provides information and advice on topics, not as a replacement.”

The Verge shares Apple Intelligence system prompts discovered by macOS users: ‘You are a helpful mail assistant,’ and other Apple Intelligence instructions.

Figure 6. Apple System prompt for email replies, used in Apple’s Smart Reply feature.

Humane's AI Pin is facing a significant return rate post-launch due to negative reviews and customer experiences, leading to only approximately 7,000 of the sold units remaining with customers.

Via the Information: Sequoia Capital has discussed funding an AI reasoning startup co-founded by Robinhood's CEO, aiming to enhance AI in reasoning and decision-making.

Developer of the Cursor AI coding assistant Anysphere raised $60 million in Series A funding led by Andreessen Horowitz, securing a valuation of $400 million. This funding round highlights investor confidence in AI-driven coding solutions and Cursor, a competitor to GitHub Copilot.

Defense tech startup Anduril Industries raised $1.5 billion, giving it a $14 billion valuation. With revenues doubling to $500 million, the company's growth trajectory indicates robust demand in defense tech.

Contextual AI raises $80 million for its ‘RAG 2.0’ platform.

The fight over California’s AI bill (SB 1047) is spilling beyond Sacramento. Democrat House member Zoe Lofgren raised criticisms of the bill, saying it is "heavily skewed toward addressing existential risk."

The FCC proposes requiring robo-callers to disclose when they’re using AI. While this seems reasonable, I wonder how enforceable it is.

AI Opinions and Articles

Researchers worry about AI turning humans into jerks. The concern from AI safety researchers is that GPT4o could influence 'social norms.' Last I checked, social media already did that. Will it get worse with the rise of AI relationships?

AI is changing science publishing in some ways for the worse, with AI plagiarism, flood of ‘junk’ papers, and other plagues.

Terry Tao, the brilliant leading mathematician and Fields Medal winner, gave a lecture on AI in science and mathematics, sharing how AI and formal proving systems (like LEAN) could remake how mathematics is done. It’s worth watching.

AI Changes Everything

Discussion about this post