AI Week In Review 23.10.07
Assistant with Bard, Adobe Stardust, Canva Magic, The Dawn of LMMs, LLaVA-1.5, StreamingLLM, and a beer-serving robot IPO
Top Tools
DALL-E 3 is now available in Bing chat for free. Try it at bing.com/create.
AI Tech and Product Releases
Google shared more AI updates at their āMade by Googleā event this week:
Assistant with Bard turns Google Assistant into a more āpersonal assistantā with generative AI, that you can interact with via voice, text or images. Itās coming to Android, iOS mobile devices as well as Google services (gmail and docs).
Chromebook Plus is adding a number of AI features from Google apps, and Google Photos Magic Eraser and other features will be coming in future releases. Google is ābringing the power of Googleās AI capabilities directly into ChromeOS.ā
Pixel 8 and 8 Pro phones are adding powerful AI features: Audio Magic Eraser removes unwanted background noise from a video; AI image editing capabilities including āMagic Editorā and a āBest Takeā tool that lets you combine the best takes for faces in a group photo.
Adobe teases new AI photo editing tool that will ārevolutionizeā its products with an āobject-aware editing engine.ā Called Project Stardust, this tool is similar to Googleās Magic Editor, it āautomatically identifies individual objects in regular photographs, allowing them to be easily moved around and changed.ā
Canva is bringing more image generation apps in-house in the Canva app. Now you can use Magic Media, Imagen by Google Cloud, and DALL-E by OpenAI to create unique images, art, and even video from a text description. You can even use Runway Gen2 inside Canva projects. These image generation apps are integrated within Canva. Donāt call it AI, call it magic.
Open AI model progress: Tekniumās fine-tuned Collective Cognition model based on MistralAIās 7B model outperforms Llama2 70B models on TruthfulQA - amazing progress. āWhat's even crazier? These trained in 3 minutes on a single 4090, with a qlora, and only ONE HUNDRED datapoints, all collected from our ShareGPT like website.ā
Axios reports on AI Wearables, including Meta smart glasses and the Humane AI Pin discussed last week, as well as a few other players:
Rewind Pendant from Rewind.ai is a neck-worn pendant that records and stores your conversations, and it has AI software to transcribe, search prior conversations, and glean insights, giving you a āsearchable database of your life's soundtrack.ā
A similar wearable device to Rewind is Tab, not yet released, that listens to conversations.
AI is bringing disruption to recruitment and HR. LinkedIn is Reimagining Hiring and Learning with the Power of AI and this week announced they are piloting AI-assisted recruiting with Recruiter2024:
With the new Recruiter 2024 experience, talent leaders can use natural language and put their hiring goal in their own words like āI want to hire a senior growth marketing leader.ā And with generative AI combined with our insights, we can infer the type of candidate the hirer is looking for and provide higher-quality candidate recommendations from a much wider pool of candidates ā¦
AI is coming to the Arc browser, with five features: Ask ChatGPT; auto tab title and download renaming; preview summaries; and āAsk on Pageā to answer AI queries specific to a page youāre looking at.
Meta is bringing Generative AI features for ads to advertisers: Background generation, text variations, and more.
AI Research News
The release of GPT-4V (for vision) is the most important AI model advance since GPT-4 in March, releasing the most powerful model that combines text and image input and output. GPT-4 is not just an LLM now, itās an LMM - Large Multi-modal Model. Microsoft researchers published a technical paper to describe in detail GPT-4Vās capabilities, called The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision).
This 150 page paper looked at the quality and generality of GPT-4V across many domains and tasks, in particular in the vision language realm, looking at pictures and videos and answering questions about then in natural language. The tasks includes reading text in images in many forms: Drivers licence info, taxes on receipts, handwriting, test questions, math equations, flow charts, etc. Interpreting written text within images is a huge leap that will automate many workflows, but GPT-4V goes beyond that.
They also explore interpreting medical images, recognizing celebrities in pictures, identifying landmarks, counting objects, explaining memes, reading and much more. One example they showed was taking an unlabeled picture, with instructions to identify and label the people in the picture. See result below. They show both broad capabilities and examples of limitations and failures across the range of tasks.
The research paper Improved Baselines with Visual Instruction Tuning presents LLaVA 1.5, an open LMM (Large Multimodal Model) that improved upon original LLaVA Visual Instruction Tuning model. LLaVA-1.5 makes for an impressive LMM model on only 13B parameters; it claims state-of-art results on 11 benchmarks in various visual question-answer (VQA) tasks:
⦠the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks.
In another approach to extending input sequence contexts, StreamingLLM shows how one token can keep AI models running smoothly indefinitely. The paper āEfficient Streaming Language Models with Attention Sinksā describes the āattention sinkā effect, where initial tokens in an LLM input context are given most attention. It then proposes StreamingLLM:
StreamingLLM [is] an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.
A Wired article reports that Researchers Tested AI Watermarksāand Broke All of Them. āA research team found itās easy to evade current methods of watermarkingāand even add fake watermarks to real images.ā
This might put a damper on hopes for using AI watermarks, but efforts will continue. Meta has some out with Stable Signature: A new method for watermarking images created by open source generative AI.
AI Business and Policy
Everyone wants to get in on the AI chip-making gold rush: Microsoft to Debut AI Chip Next Month That Could Cut Nvidia GPU Costs. And ChatGPT-owner OpenAI is exploring making its own AI chips:
āOpenAI, the company behind ChatGPT, is exploring making its own artificial intelligence chips and has gone as far as evaluating a potential acquisition target, according to people familiar with the companyās planā
Tim Cook confirms Apple is researching ChatGPT-style AI.
South Koreaās biggest IPO this year is a company making beer-serving robots. Doosan Robotics made its market debut on Thursday, raising 421 billion won ($310 million).
Snap AI chatbot investigation launched in UK over teen-privacy concerns. The UKās Information Commissionerās Office stated: āThe provisional findings of our investigation suggest a worrying failure by Snap to adequately identify and assess the privacy risks to children and other users before launching āMy AIā.ā
AI Robo-Calls And Texts Are Stealing Money Every Day: āAmericans have lost $14 billion to robotexts and $34 billion to robocalls, and scammers are focused on refining their tactics to steal money.ā AI-for-evil will require more vigilance by consumers, already jaded by robo-texts, robo-calls, and more.
JPMorgan CEO Jamie Dimon says AI could bring a 3½-day workweek. āWhen asked if the technology is likely to replace some bank jobs, he responded that āof courseā it will, but that ātechnologies always replace jobs.ā I think AI optimism is healthy, because AI is bringing more upside than downside.
āYour children are going to live to 100 and not have cancer because of technology,ā Dimon said Monday in an interview with Bloomberg TV. āAnd literally theyāll probably be working 3½ days a week.ā - JPMorgan Chase CEO Jamie Dimon
AI Opinions and Articles
Using AI for conservation in āAI of the tiger: Tiny camera 'protects' predatorāand people.ā A tiger-recognition camera system alerts a village of nearby tigers, keeping both safer.
At this yearās Code Conference, AMD CEO Lisa Su talks about semiconductors and AI. The wide-ranging conversation covered various aspects of how AI is driving the chip business these days, summed up by the final Q&A: āNP: Whatās a bet youāre making right now? LS: Weāre betting on what the next big thing in AI is.ā
āIf you think about the technology trends that weāve seen over the last 10 or 20 years ā whether youāre talking about the internet or the mobile phone revolution or how PCs have changed things ā AI is 10 times, 100 times, more than that in terms of how itās impacting everything that we do.
Iām not a believer in moats when the market is moving as fast as it is. ⦠When you look at generative AI, itās moving at an incredible pace.ā - AMD CEO Lisa Su
A Look Back ā¦
In 1642, Blaise Pascal invented the Pascal calculator, the worldās first digital calculating machine. A mechanical device based on ingenious gearing mechanisms, it was able to add and subtract two numbers. As stated in the Wikipedia article on Pascalās calculator:
āBlaise Pascal began to work on his calculator in 1642, when he was 18 years old. He had been assisting his father, who worked as a tax commissioner, and sought to produce a device which could reduce some of his workload.āthan ever.
That has always been the goal with tools for automation: Reduce human workload. While we have automated more and more through the industrial revolution and the information age, human workload never decreases even as our prosperity rises, until now.
Is AI different? Perhaps Jamie Dimon is right that weāll get to a three and a half day work week. Or perhaps automation will leave us free to do more fulfilling work, and we will work as long as before, but more productive and fulfilled than before.