AI Week In Review 23.10.07

Assistant with Bard, Adobe Stardust, Canva Magic, The Dawn of LMMs, LLaVA-1.5, StreamingLLM, and a beer-serving robot IPO

Patrick McGuinness

Oct 08, 2023

Figure 1. Cover art for Teknium’s occult-focused fine-tune of MistralAI 7B. Magic!

Top Tools

DALL-E 3 is now available in Bing chat for free. Try it at bing.com/create.

AI Tech and Product Releases

Google shared more AI updates at their “Made by Google” event this week:

Assistant with Bard turns Google Assistant into a more ‘personal assistant’ with generative AI, that you can interact with via voice, text or images. It’s coming to Android, iOS mobile devices as well as Google services (gmail and docs).
Chromebook Plus is adding a number of AI features from Google apps, and Google Photos Magic Eraser and other features will be coming in future releases. Google is “bringing the power of Google’s AI capabilities directly into ChromeOS.”
Pixel 8 and 8 Pro phones are adding powerful AI features: Audio Magic Eraser removes unwanted background noise from a video; AI image editing capabilities including “Magic Editor” and a “Best Take” tool that lets you combine the best takes for faces in a group photo.

Adobe teases new AI photo editing tool that will ‘revolutionize’ its products with an “object-aware editing engine.” Called Project Stardust, this tool is similar to Google’s Magic Editor, it “automatically identifies individual objects in regular photographs, allowing them to be easily moved around and changed.”

Canva is bringing more image generation apps in-house in the Canva app. Now you can use Magic Media, Imagen by Google Cloud, and DALL-E by OpenAI to create unique images, art, and even video from a text description. You can even use Runway Gen2 inside Canva projects. These image generation apps are integrated within Canva. Don’t call it AI, call it magic.

Figure 2. Using Magic Media inside Canva to quickly add a storyboard image.

Open AI model progress: Teknium’s fine-tuned Collective Cognition model based on MistralAI’s 7B model outperforms Llama2 70B models on TruthfulQA - amazing progress. “What's even crazier? These trained in 3 minutes on a single 4090, with a qlora, and only ONE HUNDRED datapoints, all collected from our ShareGPT like website.”

Axios reports on AI Wearables, including Meta smart glasses and the Humane AI Pin discussed last week, as well as a few other players:

Rewind Pendant from Rewind.ai is a neck-worn pendant that records and stores your conversations, and it has AI software to transcribe, search prior conversations, and glean insights, giving you a “searchable database of your life's soundtrack.”
A similar wearable device to Rewind is Tab, not yet released, that listens to conversations.

AI is bringing disruption to recruitment and HR. LinkedIn is Reimagining Hiring and Learning with the Power of AI and this week announced they are piloting AI-assisted recruiting with Recruiter2024:

With the new Recruiter 2024 experience, talent leaders can use natural language and put their hiring goal in their own words like “I want to hire a senior growth marketing leader.” And with generative AI combined with our insights, we can infer the type of candidate the hirer is looking for and provide higher-quality candidate recommendations from a much wider pool of candidates …

AI is coming to the Arc browser, with five features: Ask ChatGPT; auto tab title and download renaming; preview summaries; and “Ask on Page” to answer AI queries specific to a page you’re looking at.

Meta is bringing Generative AI features for ads to advertisers: Background generation, text variations, and more.

AI Research News

The release of GPT-4V (for vision) is the most important AI model advance since GPT-4 in March, releasing the most powerful model that combines text and image input and output. GPT-4 is not just an LLM now, it’s an LMM - Large Multi-modal Model. Microsoft researchers published a technical paper to describe in detail GPT-4V’s capabilities, called The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision).

This 150 page paper looked at the quality and generality of GPT-4V across many domains and tasks, in particular in the vision language realm, looking at pictures and videos and answering questions about then in natural language. The tasks includes reading text in images in many forms: Drivers licence info, taxes on receipts, handwriting, test questions, math equations, flow charts, etc. Interpreting written text within images is a huge leap that will automate many workflows, but GPT-4V goes beyond that.

They also explore interpreting medical images, recognizing celebrities in pictures, identifying landmarks, counting objects, explaining memes, reading and much more. One example they showed was taking an unlabeled picture, with instructions to identify and label the people in the picture. See result below. They show both broad capabilities and examples of limitations and failures across the range of tasks.

Figure 3. GPT-4V labels an image of famous AI researchers.

The research paper Improved Baselines with Visual Instruction Tuning presents LLaVA 1.5, an open LMM (Large Multimodal Model) that improved upon original LLaVA Visual Instruction Tuning model. LLaVA-1.5 makes for an impressive LMM model on only 13B parameters; it claims state-of-art results on 11 benchmarks in various visual question-answer (VQA) tasks:

… the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks.

Figure 4. Benchmark results for LLaVA-1.5.

In another approach to extending input sequence contexts, StreamingLLM shows how one token can keep AI models running smoothly indefinitely. The paper “Efficient Streaming Language Models with Attention Sinks” describes the ‘attention sink’ effect, where initial tokens in an LLM input context are given most attention. It then proposes StreamingLLM:

StreamingLLM [is] an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.

A Wired article reports that Researchers Tested AI Watermarks—and Broke All of Them. “A research team found it’s easy to evade current methods of watermarking—and even add fake watermarks to real images.”

This might put a damper on hopes for using AI watermarks, but efforts will continue. Meta has some out with Stable Signature: A new method for watermarking images created by open source generative AI.

AI Business and Policy

Everyone wants to get in on the AI chip-making gold rush: Microsoft to Debut AI Chip Next Month That Could Cut Nvidia GPU Costs. And ChatGPT-owner OpenAI is exploring making its own AI chips:

“OpenAI, the company behind ChatGPT, is exploring making its own artificial intelligence chips and has gone as far as evaluating a potential acquisition target, according to people familiar with the company’s plan”

Tim Cook confirms Apple is researching ChatGPT-style AI.

South Korea’s biggest IPO this year is a company making beer-serving robots. Doosan Robotics made its market debut on Thursday, raising 421 billion won ($310 million).

Snap AI chatbot investigation launched in UK over teen-privacy concerns. The UK’s Information Commissioner’s Office stated: “The provisional findings of our investigation suggest a worrying failure by Snap to adequately identify and assess the privacy risks to children and other users before launching ‘My AI’.”

AI Robo-Calls And Texts Are Stealing Money Every Day: “Americans have lost $14 billion to robotexts and $34 billion to robocalls, and scammers are focused on refining their tactics to steal money.” AI-for-evil will require more vigilance by consumers, already jaded by robo-texts, robo-calls, and more.

JPMorgan CEO Jamie Dimon says AI could bring a 3½-day workweek. “When asked if the technology is likely to replace some bank jobs, he responded that “of course” it will, but that “technologies always replace jobs.” I think AI optimism is healthy, because AI is bringing more upside than downside.

“Your children are going to live to 100 and not have cancer because of technology,” Dimon said Monday in an interview with Bloomberg TV. “And literally they’ll probably be working 3½ days a week.” - JPMorgan Chase CEO Jamie Dimon

AI Opinions and Articles

Using AI for conservation in “AI of the tiger: Tiny camera 'protects' predator—and people.” A tiger-recognition camera system alerts a village of nearby tigers, keeping both safer.

Figure. Pictures of a wild tiger in an AI camera system in Madhya Pradesh, India.

At this year’s Code Conference, AMD CEO Lisa Su talks about semiconductors and AI. The wide-ranging conversation covered various aspects of how AI is driving the chip business these days, summed up by the final Q&A: “NP: What’s a bet you’re making right now? LS: We’re betting on what the next big thing in AI is.”

“If you think about the technology trends that we’ve seen over the last 10 or 20 years — whether you’re talking about the internet or the mobile phone revolution or how PCs have changed things — AI is 10 times, 100 times, more than that in terms of how it’s impacting everything that we do.

I’m not a believer in moats when the market is moving as fast as it is. … When you look at generative AI, it’s moving at an incredible pace.” - AMD CEO Lisa Su

A Look Back …

In 1642, Blaise Pascal invented the Pascal calculator, the world’s first digital calculating machine. A mechanical device based on ingenious gearing mechanisms, it was able to add and subtract two numbers. As stated in the Wikipedia article on Pascal’s calculator:

“Blaise Pascal began to work on his calculator in 1642, when he was 18 years old. He had been assisting his father, who worked as a tax commissioner, and sought to produce a device which could reduce some of his workload.”than ever.

That has always been the goal with tools for automation: Reduce human workload. While we have automated more and more through the industrial revolution and the information age, human workload never decreases even as our prosperity rises, until now.

Is AI different? Perhaps Jamie Dimon is right that we’ll get to a three and a half day work week. Or perhaps automation will leave us free to do more fulfilling work, and we will work as long as before, but more productive and fulfilled than before.

AI Changes Everything

Discussion about this post