AI Week In Review 24.10.19

Llama-3.1-Nemotron, Zamba2-7B, Zyda-2, Ministral 3B & 8B, Adobe Video Generation beta, Perplexity Internal Search, ChatGPT for Windows, DeepSeek Janus 1.3B, Yi-Lightning, F5-TTS, & AI goes nuclear.

Oct 20, 2024

Figure 1. Still from Adobe Firefly video generation. *prompt: Cute baby octopus with adorable eyes holding a teacup in each tentacle and looking around happily, 3d render, octane, soft lighting, dreamy bokeh, shallow depth of field, cinematic 2*

AI Tech Product Releases

Nvidia quietly released Llama 3.1-Nemotron-70B instruct, an open model available on Hugging Face that fine-tuned from Llama-3.1 with the HelpSteer2 dataset to achieve great results. It was touted as better than GPT-4o and Anthropic Claude 3.5 due to some benchmarks (very high 8.98 on MT-Bench), but further benchmarking revealed it’s a good model but not that good. Harrison Kinsley on X did a review across several benchmarks:

I am relatively confident in saying it's more on par with Qwen2 72B, which is still a decent bit better than Llama 3.1 70B instruct.

Figure 2. Harrison Kinsley’s comparison of Nemotron with other AI models on several benchmarks, using an ensemble of 6 unique LLM scorers.

Zyphra released Zamba2-7B, a state-of-the-art LLM based on Zamba hybrid SSM-attention architecture that interleaves Mamba-2 blocks and attention layers. Zamba2-7B outperforms comparably sized Mistral, Google’s Gemma and Meta’s Llama3 series models, and is highly inference-efficient, achieving faster inference and reduced memory usage compared to Llama3-8B. Zamba2-7B is open source and available on HuggingFace.

Zyphra released Zyda-2, a 5T token dataset distilled from high-quality sources like Zyda-1, DCLM, and FineWeb-Edu, for training LLMs. The dataset was filtered with GPUs thanks to NVIDIA NeMo Curator. Zyphra claims “we made the best open-source pre-training dataset.” One proof-point is that Zamba2-7B scored better than comparably-sized LLMs despite training on fewer tokens, thanks to using the high-quality Zyda-2 dataset for pre-training.

Mistral released 2 new models, Ministral 3B and Ministral 8B, that have a context window of 128,000 tokens and are aimed at applications requiring local, privacy-first. These are competitive LLMs, beating out comparable Gemma 2 and Llama 3.1 LLMs of the same size. Unfortunately, these are not downloadable open weights models like last year’s Mistral 7B, but are commercial AI models available via API, Mistral’s LeChat platform or license.

Google updated its AI assistant NotebookLM, adding customization for audio summaries to allow users to guide AI conversations on specific topics, allowing more focused and relevant discussions. NotebookLM is no longer “experimental,” as the team launched a business pilot program for organizations interested in using NotebookLM.

Adobe launched Video Generation beta on Firefly. The Firefly Video Model beta is available to users through the Firefly Web App and Premiere Pro. Key features include text-to-video generation, image-to-video conversion, B-roll creation (generate content for B-roll footage), and 2D and 3D animation. There is also a video extension in Premiere Pro, where users can extend existing video clips by up to two seconds with Generative Extend.

Figure 3. Premier Pro has an AI-enabled feature to add lighting effects to video clips.

Perplexity announced the launch of Perplexity for Internal Search, a tool to search over both the web and team files with multi-step reasoning and code execution.

OpenAI has released an early version of the Windows desktop app that is available to ChatGPT Plus, Enterprise, Team, and Edu users. You can download ChatGPT for mobile or desktop here.

Anthropic rolled out a fresh look for the Claude iOS and Android apps, including iPad support and project features.

DeepSeek AI released Janus 1.3B, a unified multimodal LLM, which decouples visual encoding for multimodal understanding and generation. It’s based on DeepSeek-LLM-1.3b-base and SigLIP-L as the vision encoder. Janus 1.3B is available on HuggingFace. They also shared a paper on it titled “Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation.”

For developers, some new LLMs and model features are available via API:

X-AI’s Grok 2 and Grok 2 Mini are now available via API in OpenRouter.
OpenAI added advanced voice mode to in Chat Completions API, supporting text and audio input-output.
Google has released new Gemini API parameters, including logprobs, candidateCount, presencePenalty, seed, frequencyPenalty, and model_personality_in_response.
Yi-01.AI launched Yi-Lightning, a proprietary model accessible via API at platform.lingyiwanwu.com.

Google released Gemma-APS, a collection of Gemma models specialized for text-to-propositions segmentation, distilled from Gemini Pro, and fine-tuned on synthetic data. This model can be used to extract claims and facts from text.

Top Tools & Hacks

Ichigo-Llama3.1 provides local real-time Voice AI. This open-source project is inspired by Meta’s Chameleon is an early-fusion, audio and text, multimodal model. The latest improvements allow it to talk back, recognize when it can't comprehend input, and run on a local GPU. It’s a work in progress, and developers suggest trying it in Google Colab or a space on HuggingFace.

F5-TTS is an open-source text-to-speech model that performs zero-shot voice cloning with less than 15 seconds of audio, using audio clips to generate additional audio. You can download from HuggingFace and run it locally. They explain their approach in a paper “Fairytaler Fakes Fluent and Faithful speech with Flow matching (F5-TTS).”

F5-TTS has a permissive open-source license and can run locally, enabling many voice cloning use cases.

AI Research News

The Llama-3.1-Nemotron-70B LLM was developed by Nvidia using HelpSteer2 dataset and Preference prompts. Along with releasing the model and dataset on Hugging Face, Nvidia also released a paper on HelpSteer2, HelpSteer2-Preference: Complementing Ratings with Preferences.

In this paper, the authors presented the HelpSteer2 dataset details and used it to compare reward modeling approaches for RLHF, then combined those approaches for an improved instruction fine-tuning process.

This week, our Research Roundup covered multi-modality and AI video generation models:

Multi-modal LLM Benchmarks: MMIE, HumanEval-V and MixEval-X
AI video generation models: Movie Gen and Pyramidal Flow Matching
Multi-modal LLMs: Emu3, MIO, and MM1.5
Depth perception: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

AI Business and Policy

AI Goes Nuclear: Three coincidental news items suggest that AI’s energy needs may be solved with nuclear power. As AI data centers massively expand, their demand for power will strain generation capacity, and it could even limit AI expansion. To fill their need for steady, reliable, clean power, Amazon, Microsoft, Google all announced nuclear power deals for their data centers.

Two weeks ago, Microsoft announced a deal to buy power from a restarted Three-Mile Island nuclear reactor.

Amazon announced two nuclear power deals, one with Energy Northwest to fund the initial feasibility phase of a nuclear facility, the other leading a $500 million investment round in X-energy, which provides the nuclear reactor technology for the Energy Northwest project.

Google signed a pioneering agreement to purchase clean energy in the US from Kairos Power, a leader in small modular nuclear reactors. Google’s announcement mentioned that “the grid needs new electricity sources to support AI technologies” and that “nuclear solutions offer a clean, round-the-clock power source.” The Kairos Power advanced nuclear reactor is an FHR, a fluoride-salt-cooled high-temperature reactor, that combines solid pebble fuel with a low-pressure fluoride salt coolant.

First Nvidia Blackwell systems were delivered to OpenAI:

Thank you to Nvidia for delivering one of the first engineering builds of the DGX B200 to our office.

Figure 4. OpenAI gets one of the first Nvidia Blackwell DGX systems.

Penguin Random House Adds Language to Copyright Pages Prohibiting Use of Books for Training AI. The statement reads, “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems.” This move reflects growing concerns among publishers about intellectual property rights in the age of AI.

X updated its Privacy Policy to allow third-party “collaborators” to train their AI models on X data unless users opt out. The update becomes effective November 15, though the specific opt-out mechanism is not yet detailed in current settings.

OpenAI and Microsoft have hired investment banks to negotiate how much equity Microsoft should get in OpenAI for having invested nearly $14 billion into OpenAI. The negotiations also involve determining equity shares for CEO Sam Altman and employees, alongside specific governance rights for Microsoft.

OpenAI pledges to only use its patents defensively, avoiding threats against parties that do not engage in harmful activities against the company or its users. However, experts argue that this pledge is vague and lacks concrete legal weight, potentially serving more as public relations maneuvering than a substantive commitment.

AI startup funding news:

SandboxAQ, spun out from Alphabet’s X (formerly Google X), is seeking to raise funds in another round that would value it at $5 billion. SandboxAQ focuses on quantum computing and AI for various scientific fields and raised $500 million in early 2023.
Dottxt, a U.S.-based startup born out of the open-source project Outlines, has raised $11.9 million to develop tools that make AI outputs more structured and computer-friendly, addressing a critical need in enterprise adoption of AI.
Abel, an AI company that automates police report writing, secured a $5 million seed round. The technology aims to reduce administrative burdens on law enforcement, allowing more time for active duty.
AI startup Treehouse recently raised a $16 million Series A. Treehouse uses AI to help electricians install tech like EV chargers and heat pumps more cheaply, by predicting job durations and material needs and streamlining installation processes.
Support automation company Capacity secured a $26 million Series D funding round and acquired three companies to expand its AI capabilities in customer service automation.

LatticeFlow has developed a framework to evaluate LLM compliance to the European Union's AI Act, assessing aspects such as technical robustness, safety, diversity, non-discrimination, and fairness.

The Biden administration has privately discussed capping sales of advanced AI chips from Nvidia and AMD to certain Persian Gulf countries for national security reasons. This move follows last year’s ban on selling these chips to China.

AI Opinions and Articles

AI Sludge Alert: Local News Site Hoodline Falsely Accuses San Mateo DA Of Murder. The cause of the false headline was a case of AI reading a tweet into plain text, turning the tweet author – the San Mateo DA – into the arrested suspect. The author at TechDirt uses this story of horrible journalism to make a larger point:

We’ve written multiple times about the growth of absolutely horrible AI sludge journalism, in which crappy (often legacy) news sites are replacing reporters with terribly written, prone-to-lying, AI journalists, with apparently zero editorial review. As I’ve said, I do think there are places where generative AI tools can be useful in journalism, but it’s not in writing stories independently without any review.

The bottom line: Use AI wisely and check its output. It’s not a bad tool but a bad tool user who lets it create bad results.

AI Changes Everything

Discussion about this post