AI Week In Review 23.11.04
Big upgrades from Runway Gen2, GPT-4, MidJourney, X's Grok, Luma Genie, Red Pajamas 30T dataset, Brave Leo, Distil-Whisper, Biden EO, UK Summit, Beatle's Now and Then
TL;DR - Another BIG week in AI
AI Tech and Product Releases
Runway announced an update of Gen2 text-to-video: “We have released an update for both text to video and image to video generation with Gen-2, bringing major improvements to both the fidelity and consistency of video results.” This impressive quality upgrade elicited phenomenal reactions from users: “Game-changing” - “Quality went up immensely” - “absolutely amazing.” It sets a new quality bar for text-to-video.
MidJourney launched V1 of their Style Tuner, which allows users to generate a unique visual style and apply it to any group of images generated. This is a big upgrade, enabling users to make consistent images and tell coherent stories with them. Users can save and share that style, so we can expect Styles to become the new Fine-Tune on image gen AI.
Users have already developed and shared new styles, showing off how “Midjourney's Style Tuner makes me able to do things I could only have dreamt of before.” Sergio valsecchi says: “MidJourney finally adds coherence and consistency to generated images with the Style Tuner command! With RunWay Gen-2, there will be a lot of fun.”
OpenAI’s GPT-4 has gone all-in-one, with an update that rolls all their features (except for plug-ins) into one interface. It also uses 32K context window.
While this seems like just a convenience update, some users are showing off the power of multi-modality in a single conversion. Further, the file upload capability has been dubbed an AI startup killer: “With one update, OpenAI killed 1000s of AI startups. … ChatGPT becomes the ultimate AI super-app combining Midjourney, PDF Chat, Perplexity AI, and advanced Data analysis all in one app.”
Phind announces their AI coding model beats GPT-4 at coding, with GPT-3.5-like speed and 16k context. Their 7th generation “Phind Model V7 achieves 74.7% pass@1 on HumanEval.” Speed is about 100 tokens per second.
Brave announced their AI-chatbot named Leo is now available in the Brave browser. Based on Llama2 AI model, its selling point is unparalleled privacy.
Generative AI startup Kaiber is launching a Mobile App.
Text-to-3D is making big progress:
Luma announced Genie, a new text-to-3D Foundation AI model that lets anyone make 3D objects from text in seconds. Genie is a research preview currently available on Discord.
Stability AI’s latest tool uses AI to generate 3D models. Stability’s new 3D generation model Stable 3D is in Private Preview.
Blockade Labs Combines 360 & NeRFs -”New research presents Blockade Labs Skybox AI as a panoramic generator for training PERFs to create cost effective and scalable 3D free roaming with AI.” Translation: This is AI for easier ways to make 3D virtual worlds in apps and games.
Meta has their recently made their AI apps available: Emu image generation model is now in Facebook and Whats App. And Instagram Leaked "AI Friend Creator" to create and build instagram AI chatbots.
Musk's xAI launches first AI model to select group. There were few preview details behind it other than Musk claiming “In some important respects, it (xAI's new model) is the best that currently exists.” As of Nov 4 launch day, here is what we know of X’s new AI model, named Grok, thanks to Brian Roemmele:
Grok has access to the current X data feed with a “live” search engine that looks to X first for context, making it useful for breaking news.
Grok has 25,000 character context window, fast response time, and humorous persona. It was trained on “The Pile” and X platform data.
It’s available to X Premium+ subscribers. Expect an API as well as image and audio input and outputs in future iterations.
Together AI announced Red Pajama Data-v2, an Open Dataset with 30 Trillion Tokens for Training Large Language Models. This is a huge increase in the availability of open data for LLM training, consisting of:
30 trillion filtered and deduplicated tokens (100+ trillions raw) from 84 CommonCrawl dumps covering 5 languages, along with 40+ pre-computed data quality annotations that can be used for further filtering and weighting.
Top Tools & Hacks
Afore-mentioned MidJourney Styles, Runway Gen 2 update, and GPT-4 ‘all-in-one’ are top new tools to try out, but also …
Distil-Whisper from Hugging Face is a distilled version of Whisper speech recognition AI model that is 6 times faster, 49% smaller, and performs within 1% word error rate (WER) on out-of-distribution evaluation sets. This model can transcribe a five minute audio clip into text in about 8 seconds. They describe it in a paper “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling.”
Distil-Whisper is an open model that can be embedded in AI applications using the HuggingFace Transformers library. They will have a CPU version of this soon, so there will be various ways to run this locally.
AI Research News
The research paper “Acoustic Analysis and Prediction of Type 2 Diabetes Mellitus Using Smartphone-Recorded Voice Segments” shows that Diabetes Can Be Detected With Your Voice, thanks to AI that can detect subtle vocal differences that humans cannot.
Low-latency Real-time Voice Conversion on CPU presents an open source AI model for real-time any-to-one voice conversion. LLVC uses a GAN architecture and knowledge distillation to produce their results.
FlashDecoding++: Faster Large Language Model Inference on GPUs introduces a number of innovations in inference mathematical calculations to speed up LLM inference. They claim the FlashDecoding++ LLM inference achieves up to 4.86x speedup compared to Hugging Face implementations, and 1.37x compared to state-of-the-art LLM inference engines.
The paper Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game presents a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based “defenses” against prompt injection. This was “all created by players of an online game called Tensor Trust.” They believe it is the largest database of prompt injection prompts, which could be a great asset in stress-testing AI Security. Their project page is at https://tensortrust.ai/paper.
Notable AI research that we mentioned in our “When AGI?” article this week is the paper “Does GPT-4 Pass the Turing Test?” Plot reveal: No, GPT-4 falls a bit short at fooling humans it’s not AI.
AI Business and Policy
Google invests $2B in Anthropic.
Mistral AI is raising money at a $2 billion valuation. Eye-popping numbers for a tiny AI startup that is under a year old.
Apple unveils M3, M3 Pro, and M3 Max chips for their PCs and laptops, and they are super-advanced 3nm beauties, build for AI applications. They have improved the GPI and the Neural Engine. Also, Apple CEO Tim Cook confirms Apple’s Generative AI investments.
How AI company Primer parsed misinformation early in Israel-Hamas war: Their Command software sorts through social media and news sources to create query-driven contextual information and summaries about relevant people, places and things of note:
In a demo at the AUSA event, the software sorted through information related to the Israel-Hamas war and then produced a continuously refreshed timeline of events. Some of the points were geolocated, generating a heat map of posts and interactions.
The biggest AI policy news of the week is the Biden administration launching a sweeping executive order on AI, which we discussed in detail in an article earlier this week. Coinciding with the UK Summit on AI (see below), this was an important week for AI policy.
On the heels of this, the DoD released a new data and AI strategy. The public document is titled “Data, Analytics, and Artificial Intelligence Adoption Strategy.”
AI Art Wins a Copyright Victory. A judge ruled that “because AI image generators reference art by many different artists when generating new imagery, unless it is possible to prove that the resulting image referenced solely or primarily copyrighted art, and is substantially similar to that original copyrighted work, it is likely not infringing of the original work.” So holders of copyright have to sue on the basis of actual output, not simply using copyrighted material in the training set.
AI Opinions and Articles
AI is Collins Dictionary “Word of The Year” and defined it as “abbreviation for artificial intelligence: the modelling of human mental functions by computer programs.” AI changes everything, even language itself
U.K.’s Prime Minister Rishi Sunak and the U.K. Government hosted a Summit on AI, bringing tech leaders, AI researchers, and Government officials from the UK and other countries for discussions. Representatives from EU countries, India, China and USA were there.
Overall, the discussions were balanced between the overwhelming good AI could do for humanity, and the discussions of AI risks and AI Safety concerns. For some AI worriers, ‘Bletchley made me more optimistic’ and showed some level of consensus across nations and that with the how experts reacted to AI summit.
Nick Clegg, former deputy UK Prime Minister for example, expressed need to get AI Safety oriented towards more real present-day concerns.
“The newspapers love running pictures of scary robots with glaring red eyes and saying they’re going to take over tomorrow. But actually there’s a lot of homework that needs to be done on more proximate challenges, which I worry may may play second fiddle to some of those more speculative risks.” - Nick Clegg
Elon Musk told Rishi Sunak AI will put an end to work in an interview with the UK Prime Minister:
“We are seeing the most disruptive force in history here. There will come a point where no job is needed - you can have a job if you want one for personal satisfaction but AI will do everything.” - Elon Musk
A Look Back …
The Beatles’ last song “Now and Then” has been released. Originally a John Lennon demo tape, the rest of the Fab Four had wanted to do a remix, but the quality of the original wasn’t up to the project:
It was McCartney's idea last year to re-approach "Now and Then" and pull a usable version of Lennon's vocal using the same technology that'd been used to separate music or conversation from background noise for Peter Jackson's "Get Back" documentary film.
Thanks to some AI algorithms, similar to the tools that can remove background noise from podcasts and videos, they were able to clean up John Lennon’s voice and create the final track.
Here’s the Beatles “Now and Then” on X, and here’s the YouTube:
I’m looking forward to having AI revive, restore and remix art from the ages. Genius.