AI Week In Review 23.10.28
SSD-1B, Shutterstock AI, Zephyr 7B, Eureka robot training, Frontier Model Forum, Latent Consistency Models, UK PM Sunak sets up AI Safety Institute, Data Provenance Initiative
AI Tech and Product Releases
More and better image generation AI keeps coming out: Segmind announced SSD-1B, an open source diffusion-based text-to-image AI model, which is 50% smaller and 60% faster compared to the SDXL 1.0 model. This reduction in speed and size comes with a minimal impact on image quality when compared to SDXL 1.0. Available on HuggingFace.
Amazon’s new generative AI tool lets advertisers enhance product images, letting advertisers generate backgrounds based on product descriptions and themes.
In a similar vein, Shutterstock Integrates Creative AI into Library of 700M Images. Users can add or remove elements or change up backgrounds in the stock images, so now they have a marketplace of customizable stock images. “Now, creatives have everything they need to craft the perfect content for any project with AI-powered design capabilities … presenting infinite possibilities to make stock your own."
Google Maps is becoming more like Search — thanks to AI: “Google is adding a range of new AI-powered features to Maps, including more immersive navigation, easier-to-follow driving directions, and better organized search results. The end result is an experience that will likely remind many users of Google Search.”
This may help with image misinformation: Google announces tools to help users fact-check images.
An open source LLM called Zephyr-7b-beta has been released. It’s a fine-tune of Mistral 7B that uses an efficient method called distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment. It shows new best-in-class performance for its size, beating out Llama2 70B model on MT-Bench, a chat model benchmark.
Aside from the value of Zephyr 7B as an effective AI chat model that could be run locally, the dDPO process is a path to more efficient and scalable distillation of AI models.
In related news, Google open-sourced Distilling Step-by-Step, a method of fine-tuning smaller LLMs from output of larger LLMs.
LAION, the non-profit group behind Stable Diffusion, has announced Open Empathic, which aims to build open-source AI systems with empathy and emotional intelligence:
“In an increasingly AI-driven world, it is of paramount importance that AI systems possess emotional intelligence to understand and respond to human emotions. As AI plays an ever-expanding role in our daily lives, ranging from education to healthcare, elderly care, and commercial contexts, it becomes vital to prioritize the well-being and emotional intelligence of AI-human interactions.”
Poe has added creator monetization, so that creators of AI bots and custom LLMs on Poe can get monetized for their use.
Introducing jina-embeddings-v2, the world's first open-source embeddings model boasting an 8K context length. Matching the prowess of OpenAI's proprietary models, now accessible open source signaling a significant milestone in the landscape of text embeddings.
Top Tools & Hacks
TIME Magazine’s Best Inventions of 2023 has a number of AI-related inventions, including some obvious major AI releases this year: GPT-4, Runway Gen2, Adobe Photoshop Generative Fill. Oddly, they also include unreleased Humane AI Pin, but ignore the open source LLMs like Llama2, the best coding assistants, and MidJourney 5. So TIME had both hits and misses.
AI Research News
New research from Dr Jim Fan and Nvidia puts New Spin on Robot Learning. Eureka, a new AI agent developed by NVIDIA Research, uses LLMs to automatically generate reward algorithms to train robots. As shown in their paper “Eureka: Human-Level Reward Design via Coding Large Language Models,” this AI agent trained a robotic hand to perform rapid pen-spinning tricks, showing it can train robots as well as a human can.
Eureka is a super human reward engineer. The agent outperforms expert human engineers on 83% of our benchmark with an average improvement of 52%.
In particular, Eureka realizes much greater gains on tasks that require sophisticated, high-dimensional motor control.
With a goal to improve dataset transparency, the Data Provenance Initiative has been set up by multi-disciplinary team to catalog and audit of datasets used to train LLMs:
As a first step, we've traced 1800+ popular, text-to-text finetuning datasets from origin to creation, cataloging their data sources, licenses, creators, and other metadata, for researchers to explore using this tool. The purpose of this work is to improve transparency, documentation, and informed use of datasets in AI.
They published on their efforts in the paper “The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI.” They also put a data provenance explorer on their website. This effort will help AI practitioners know the legal constraints on various datasets available for training AI.
Woodpecker: Hallucination Correction for Multimodal Large Language Models” presents a way to reduce hallucinations in multi-modal AI models, correcting incorrect descriptions of images: “Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction.” Their methods show large improvement in the accuracy of the models on measured benchmarks.
“Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference” is a recent paper showing how to make more efficient image generation from latent diffusion models, such as stable diffusion, with LCMs, Latent Consistency Models. They are designed to directly predict the solution of a diffusion process, reducing the many iterations in diffusion models into a few steps.
The end result are LCMs that can produce images very quickly: “Early signs show that this will soon enable us to generate images in 100ms (10 images/second).” One such LCM that can do this is Dreamshaper_v7, available on HuggingFace.
Yet another approach on diffusion models is Matryoshka Diffusion Models, an end-to-end framework for high-resolution image and video synthesis. “We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small-scale inputs are nested within those of large scales.” This produces strong results for high-resolution image generation, up to 1024x1024 pixels.
AI Business and Policy
Cruise pauses all driverless robotaxi operations to ‘rebuild public trust’ after California DMV suspended their driverless testing permits. This setback doesn’t mean driverless taxis aren’t moving forward though. Waymo driverless vehicles are now available through Uber, starting first in Phoenix.
Google has come back to the table with $2B more investment in Anthropic. Anthropic had already gotten large investments from Google, Amazon and others. Anthropic’s $5B plan to build “Claude-Next” by 2025 seems to be on track.
According to The Information, the Humane Pin will cost $1000. Expect an announcement in November 9th.
OpenAI is building a Preparedness team to address risks of “Frontier” AI models and launching a challenge to address AI Safety, called the Preparedness Challenge.
OpenAI also made a joint announcement with Google, Anthropic, and Microsoft about the Frontier Model Forum, an joint body focused on safety of the most advanced AI models:
They named Chris Meserole as first Executive Director for Frontier Model Forum
They announced a new AI Safety Fund with over $10 million to accelerate academic research on frontier model safety.
The Biden administration is set to kick-starting AI regulations as “Sweeping new Biden order aims to alter the AI landscape.” Politico says this “all-hands effort to impose national rules on a fast-moving technology” is coming Monday. The order will initiate looking at policies in many parts of the administration, from education to healthcare, cybersecurity, and more.
British Prime Minister Rishi Sunak announced he would establish a new artificial intelligence safety institute and is seeking buy-in for a new global expert panel on the emerging technology. “I can announce that we will establish the world's first AI Safety Institute, right here in the U.K.”
AI Opinions and Articles
Bill Gates feels Generative AI has plateaued, says GPT-5 will not be any better. While acknowledging the leap from GPT-2 to GPT-4 as “incredible,” he believes that GPT-4 may be the plateau and GPT-5 might not surpass it. At the same time predicting AI software’s accuracy will substantially improve while costs decrease.
I believe he’s wrong to assume AI has hit a wall at GPT-4, but there is an open question as to if and when scaling will stop leading to improvements. However, there is nothing on what we are seeing in new AI model releases to suggest we’ve hit any walls yet.
What the Zephyr model release and other improved AI models each week are telling us is that there are more gains to be had; both in efficiency and performance. With 7B models as good as what 180B models were two years ago, I seriously doubt we cannot soon make our largest models better as well, even at the same parameter count.
A Look Back …
“If I asked people what they wanted, they would have asked for a faster horse.” - Attributed to Henry Ford.
Will older styles of living make a comeback with the arrival of robo-versions of farm animals? This back-to-the-future a robo-dog drawn carriage, brought to you by https://twitter.com/DanielleFong/status/1718316833672167841: