AI Week In Review 24.08.24

Phi-3.5, Mistral-NeMo-Minitron 8B, Ideogram 2.0, Dream Machine 1.5, Hotshot, Jamba 1.5, Gemini Gmail polishing, GPT-4o fine-tuning, Dracarys 70B and 72B for coding, Einstein SDR Agent.

Aug 24, 2024

Figure 1. Ideogram 2.0 AI image generation, showing faithful text rendering. AI images with text changes graphics design.

AI Tech and Product Releases

Microsoft released Phi-3.5, a suite of new SLMs (Small Language Models) building on the Phi-3 model family:

Phi-3.5-mini, a 3.8B SLM with enhanced multi-lingual support and 128K context window length.
Phi-3.5-vision improves multi-frame image understanding and reasoning, boosting performance also on single-image benchmarks.
Phi-3.5-MoE, a Mixture-of-Experts (MoE) model featuring 16 experts and 6.6B active parameters; it provides high performance and multi-lingual support, beating larger models on various benchmarks. For example, Phi-3.5-MoE achieves 78.9 on MMLU, better than Gemini 1.5 flash and GPT-4o-mini.

This continues the trend of Phi models being highly efficient open weights AI models, with SOTA performance for their size. Phi 3.5 models are available on HuggingFace and on Microsoft Azure.

NVIDIA released Mistral-NeMo-Minitron 8B, a miniaturized version of the recently released Mistral NeMo 12B model, delivering high accuracy in an efficient Small Language Model (SLM) optimized for local use. Mistral-NeMo-Minitron 8B used pruning and distillation to achieve up to 40x cost savings in training.

Mistral-NeMo-Minitron 8B and quantized versions are available on HuggingFace.

Figure 2. Mistral-NeMo-Minitron 8B outperforms other similar-sized models on AI benchmarks.

Ideogram announced the release of Ideogram 2.0 AI image generation model, challenging competing AI image generators with improved text rendering and customizable color palettes:

The Design style of Ideogram 2.0 significantly boosts the accuracy of text in the generated images. This enables you to create premium graphic designs for greeting cards, print on demand, posters, illustrations, and marketing and social media content with long, stylized text.

Ideogram 2.0 is being called a “Real Game-Changer in AI text-to-image Generation” on account of Design style features. Ideogram 2.0 is now available and free, up to 40 images generation per day.

Figure 3. Ideogram 2.0 has faithful text rendering and high-quality image output.

Perhaps in response to the competition in AI image generation, Midjourney announced their web experience is now open to everyone and they are also turning back on free trials with 25 images limit to let you check it out.

Luma Labs released Dream Machine 1.5 which promises “higher quality text-to-video, smarter understanding of your prompts, custom text rendering, and improved image-to-video.” Dream Machine 1.5 does a great job rendering text in video, such as shown in this video: the word “pasta” made of pasta falling onto a plate.

Figure 4. Luma Labs Dream Machine 1.5 can render text in video faithfully and well.

Hotshot launched a new text-to-video AI generator called Hotshot, “a large-scale diffusion transformer model that generates up to 10 seconds of footage at 720p.” Remarkably, this Sora-level AI video generation model was developed by a team of just 4 people. You can check out examples and make your own video at Hotshot.co.

Google released new Gemini features for Gmail, enhancing its AI writing tools to refine emails faster with updates to Help me write in Gmail:

“When using Gemini to refine emails, users can choose from the following options: Formalize, Elaborate and Shorten. We recently added the Polish option to web and mobile, which can effortlessly refine your emails, saving you time”

AI21 Labs has launched Jamba 1.5, offering two open models, Jamba 1.5 Mini (12B active parameters) and Jamba 1.5 Large (94B active), that are both hybrid SSM-Transformer MoE models with a long 256K context window and multilingual support. Jamba 1.5’s hybrid architecture has advantages in speed and long context over transformer-based models.

OpenAI announced fine-tuning for GPT-4o is available to all developers on paid usage tiers through OpenAI's fine-tuning dashboard.

To encourage adoption, OpenAI is offering up to 1 million tokens per day for free to use on fine-tuning GPT-4o until September 23^rd, and also offering 2 million free training tokens per day for fine-tuning GPT-4o mini model until then. Normal fine-tuning training costs are set at $25 per million tokens. Google also offers a highly competitive fine-tuning offering for Gemini Flash.

Abacus.AI has released Dracarys, open-source LLMs fine-tuned for coding tasks, available in two sizes: Dracarys-70B-Instruct, a fine-tune of Llama 3.1 70B, and Dracarys-72B-Instruct, a fine-tune of Qwen-2 72B. Both are open weights and released on HuggingFace. “Our recipe improves the open-sourcing model and Dracarys-72B-Instruct is the best coding model in its class.”

Salesforce launched two new AI-Powered Sales Agents, Einstein SDR Agent and Einstein Sales Coach Agent, to automate routine tasks and provide suggestions to sales teams. Einstein SDR Agent manages inbound leads, answering questions and booking meetings. Einstein Sales Coach Agent assists salespeople by offering real-time suggestions during calls and helping them rehearse pitches.

D-ID unveiled an AI Video Translation tool that translates videos into 30 languages while cloning the speaker's voice and synchronizing lip movements.

Perplexity is now rolling out code interpreter which can generate charts and install libraries. This is similar to ChatGPT’s code interpreter, but Perplexity's update can access any internet data.

AI Research News

This week’s AI research roundup covers multi-modal models, video foundation models (VFMs), and image generation models:

TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Imagen 3

AI Business and Policy

Authors sue Anthropic for training AI using their books. Several authors filed a lawsuit against Anthropic, alleging unauthorized use of their works by scraping LLM training data from Books 3, a massive library of pirated books. This follows similar lawsuits against OpenAI, Meta, and EleutherAI - all alleging copyright violations over use of The Pile and Books 3, which contains copyrighted works.

OpenAI and Condé Nast Collaborate to share media content on ChatGPT and SearchGPT:

We’re announcing a partnership with Condé Nast to display content from top brands like Vogue, The New Yorker, Condé Nast Traveler, GQ, Architectural Digest, Vanity Fair, Wired, Bon Appétit, and more, within our products, including ChatGPT and our SearchGPT prototype.

This partnership is one way to address legal challenges in AI using and surfacing copyrighted works from media sources. OpenAI has been quite successful at making news organizers partners instead of opponents in news-AI integration.

Recall faces Recall - Microsoft Postpones Recall AI Launch: The launch is delayed until October due to security concerns.

Perplexity AI plans to start running ads as AI-assisted search gains popularity. The AI-assisted search startup will start running ads alongside AI-assisted search results on its search app in Q4, similar to Google Search ads.

AI is coming to more budget phones thanks to Qualcomm's new Snapdragon 7s Gen 3. This new chip offers major performance improvements, including 40% quicker GPU and 30% better AI performance, supporting wider adoption of on-device AI.

AI startup fundraising news:

AI code tool developer Cursor raised $60 million in Series A funding. Their ambitious goal: “building a magical tool that will one day write all the world's code.”
Mental health startup Slingshot AI just raised $30 million, from Andreessen Horowitz and others.
Andreessen Horowitz Backs AI Copyright Startup Story with $80 million in Series B funding: “Story is proposing a radical solution: remaking the intellectual property regime in order to let creators rapidly register their works on a blockchain, and use it to track and distribute royalties.”

OpenAI came out against California's controversial SB-1047, saying it will hurt AI innovation, and suggesting it should be handled by Federal regulations, not state-by-state. OpenAI has been favorable towards some AI regulations, so the company's opposition to this bill is a signal that most in the AI industry see it as overkill.

AI Opinions and Articles

CEO of iPad Illustration App Procreate says he hates generative AI. In a contrarian move, Procreate CEO condemned generative AI and its impact on artists and creatives. He went even announced that they will not integrate AI generative features into their products because they believe it will hurt artists.

In a leaked recording, Amazon cloud chief tells employees that most developers could stop coding soon, suggesting that AI could displace human coders within 24 months.

"Coding is just kind of like the language that we talk to computers. It's not necessarily the skill in and of itself. The skill in and of itself is like, how do I innovate? How do I go build something that's interesting for my end users to use?" - Matt Garman, head Amazon Web Services (AWS)

AI Changes Everything

Discussion about this post