AI Week In Review 24.11.23

DeepSeek-R1-Lite-Preview, Mistral Le Chat and Pixtral, GPT-4o enhances writing, Gemini exp-1121, chatGPT voice mode, Gemini memory, Qwen 2.5 turbo, BFL Flux.1 Tools, Tulu 3, Coca-Cola's AI slop.

Nov 23, 2024

Figure 1. Coca-Cola’s AI generated Holiday ad got a frosty reception for being fake ‘soulless’ AI slop.

AI Tech Product Releases

DeepSeek AI launched DeepSeek-R1-Lite-Preview, a new reasoning model similar to OpenAI’s o1, matching o1-preview performance on AIME & MATH benchmarks and adding a transparent thought process. Reviewers on X have noted its math reasoning improvements but also challenges in coding and other tasks.

Most importantly, DeepSeek-R1-Lite-Preview confirms the Inference Scaling Laws shown by o1. It will be released as an open-source AI model, blazing a path to open-source development of problem-solving and reasoning AI models. It’s also available to try at chat.deepseek.com.

Figure 2. DeepSeek-R1-Lite-Preview confirms Inference Scaling Laws - Longer Reasoning at inference yields Better Performance. DeepSeek-R1-Lite-Preview shows steady score improvements on AIME as thought length increases.

Mistral released a big upgrade to their Le Chat, adding web search, vision, ideation canvas, and image generation via BlackForestLabs’ Flux Pro integration. Mistral also released Pixtral Large, available through Le Chat, a multimodal LLM with SOTA vision capabilities.

As we explain in our article from earlier this week in Mistral’s New Le Chat and the Canvas Interface, Mistral’s AI model updates and AI feature integrations position Le Chat as a best-in-class AI platform. Le Chat is available for free users. Both Pixtral and the latest Mistral Large 2411 are also available for download via HuggingFace.

OpenAI updates GPT-4o with Enhanced Creative Writing. OpenAI says:

“The model’s creative writing ability has leveled up–more natural, engaging, and tailored writing to improve relevance & readability.”

Users have had positive reactions to GPT-4o's more natural, engaging writing, and this latest ChatGPT-4o retook the top spot on LMsys Arena leaderboard.

Not to be left behind, Google released Gemini Exp 1121 a day later and regained the LMsys Arena crown they had taken from GPT-4o just a week prior.

Figure 3. The LMsys Chatbot Arena score shows significant competition and ongoing progress by the AI labs.

OpenAI has rolled out Advanced Voice Mode feature on the web for ChatGPT’s paying subscribers, enabling natural voice interactions directly on chatgpt.com. OpenAI promised in their announcement on X that it would roll out to free users as well in coming weeks.

Google has added a memory feature to Gemini chatbot. Available to subscribers, the memory function allows Gemini to store personal preferences and context from conversations:

Starting today, you can ask Gemini to remember your interests and preferences, whether it’s about your work, your hobbies, or your general aspirations in life. This helps Gemini provide even more helpful and relevant responses, tailored precisely to your needs. Think of it as giving Gemini a user manual, designed by you.

Perplexity introduced a new shopping feature for its subscribers, offering integrated product recommendations and one-click checkout directly within search results. Perplexity’s Shopping tool presents detailed visual cards for products, pricing, and seller info, aiming to compete with Google and Amazon in e-commerce search.

Alibaba’s Qwen team has announced Qwen 2.5 Turbo, with a 1 million token context window and faster inference speed; this is the largest AI model context window besides Gemini 1.5. It’s available via API and as a HuggingFace spaces demo.

YouTube announced that its Dream Screen feature for Shorts now supports AI-generated video backgrounds using Google DeepMind’s Veo model.

Brave introduces AI chat mode for follow-up questions on its search engine. This new feature allows users to ask additional questions based on initial queries directly through a chat bar under the AI-generated summaries, offering an experience not available on Google.

Top Tools & Hacks

Black Forest Labs announced the release of Flux Tools, four tool features that help users control and steer image generation.

FLUX.1 Fill for inpainting and outpainting, enabling editing and expansion of images given a text description and a binary mask.
FLUX.1 Depth and FLUX.1 Canny enables structural guidance based on a depth or canny edge map extracted from an input image and a text prompt to maintain precise control during image transformations. This can preserve the original image’s structure, particularly effective for retexturing images.
FLUX.1 Redux is an adapter that can remix, restyle, and recreate input images via text prompts.

The FLUX.1 Tools are available as open-access models within the FLUX.1 [dev] model series, in the BFL API supplementing FLUX.1 [pro], and can be tried out on the fal FLUX.1 playground.

Figure 4. Outputs from FLUX.1 redux. Given an input image, FLUX.1 Redux can reproduce the image with slight variation, allowing to refine a given image.

AI Research News

Is Sage Attention the next Flash Attention? As shared in SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration, SageAttention is a 4/8-bit quantization method that accelerates the attention mechanism in transformers by 3x and 5x over FlashAttention2 and xformers.

The Allen Institute for AI announced the release of Tülu 3, fully open SOTA post-trained 8B and 70B LLMs that outperform post-trained models of the same size such as Llama 3.1-Instruct and Qwen2.5-Instruct. All Tulu 3 collateral is being shared via open-source release: training datasets, data curation tools, data decontamination scripts, training code, and evaluation suites. Tulu 3 is available to try out on the Allen AI playground and download from HuggingFace and Ollama.

Figure 5. Thanks to an extensive post-training regimen, Tulu-3 is state-of-the-art, competitive with Qwen 2.5 and Llama 3.1 LLMs of the same size, both 8B and 70B.

They shared the recipe and technical details on Tulu-3 in the paper TÜLU 3: Pushing Frontiers in Open Language Model Post-Training:

The advancements of Tülu 3 are attributed to careful data curation leading to Tülu 3 Data, new permissively licensed training datasets targeting core skills, improved training infrastructure, Tülu 3 Code, reproducible evaluation toolkit Tülu 3 Eval, and innovative methodologies and guidance through training stages, Tülu 3 Recipe.

Figure 6. An overview of the TÜLU 3 recipe, including data curation, targeting general and specific capabilities with training strategies, an evaluation suite for development, and a final evaluation stage.

Our AI research highlights article for this week covered these recent research results in multimodal LLMs and open AI model development:

LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
MagicQuill: An Intelligent Interactive Image Editing System
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
RedPajama: an Open Dataset for Training Large Language Models

AI Business and Policy

The age of AI is in Full Steam – Nvidia CEO, Jensen Huang

Nvidia reported record revenue and earnings in their Q3 earnings report this week. Revenue for this quarter was $35.1 billion, up 94% from a year ago, thanks to Hopper and Blackwell AI system adoption driving data center sales of $30.8 billion for the quarter. Nvidia’s CEO, Jensen Huang, assured investors of Nvidia's competitive position in AI inference and touted Nvidia’s Blackwell, saying demand for it is “incredible.” The AI build-out is still ramping up.

Apple is developing a new version of Siri powered by advanced LLMs, but it won’t be released spring 2026. The updated Siri will replace the current interface, but until then, Apple will integrate OpenAI’s ChatGPT into its devices for AI capabilities.

OpenAI released a free course for K-12 teachers to integrate ChatGPT into classrooms. The program, created with Common Sense Media, covers AI basics and applications but faces skepticism from educators over ethical concerns and potential misuse.

A Paris-based startup founded by Google alums called H has launched its first product, Runner H. Built on a proprietary compact LLM with just 2 billion parameters, Runner H is an AI tool aimed at businesses for tasks like quality assurance and process automation.

AI startup fundraising news:

Anthropic Raises $4 Billion from Amazon to Train AI Models on AWS. Anthropic will use AWS’s Trainium accelerators for model training and Inferentia chips for deployment, deepening its technological collaboration with Amazon. The investment brings Amazon’s investment in Anthropic to $8 billion.
Crusoe Energy, a startup building data centers for tech giants like Oracle, Microsoft, and OpenAI, is raising $818 million, with $686 million already secured from seventy investors.
New Lantern recently secured $19 million to streamline radiologists’ workflow through AI automation.
Four Growers has raised $15 million in funding to develop robots for autonomous harvesting in greenhouses.
Federato Raises $40 Million to Expand AI-Powered Underwriting Platform.
Converge Bio Raises $5.5M to Democratize AI in Biotech Research.

AI Opinions and Articles

Coca-Cola causes controversy with AI-made ad. It seems that Coca-Cola’s AI generated Holiday ad was too fake and got panned online for being fake ‘soulless’ AI slop that is “missing heart and depth”. AI can do many things well, but this isn’t one of them. Faking sentimentality with not-quite-realistic AI video generation is not a beneficial use of AI.

AI Changes Everything

Discussion about this post