AI Week In Review 23.12.23
MidJourney V6, SunoAI, Google VideoPoet, Apple's LLM in a flash, ReSTEM
TL;DR - This week brought new generative AI advances: Midjourney V6, and Suno AI. AI generated images, video and music are better than ever as we close out 2023.
AI Tech and Product Releases
Midjourney v6 was released as an alpha and is live on Discord. This release got a lot better at writing and drawing text (catching up to Dall-E 3), following prompts, image prompting and ‘remix mode’, where you can re-style images.
Reactions and creative results with Midjourney V6 on X suggest it’s a hit:
Chase Lean reviewed prompt results for Midjourney V6 versus V5.2: “Version 6 follows the instructions in your prompts much more than version 5.2. The images generated by v6 have higher detail, and are more realistic compared to v5.2, which look airbrushed sometimes.”
“At our current level, 99% of normal people already cannot tell a real image from an AI generated one.”
Linus Ekenstam - “The world is never going to be the same again, people will lose their jobs, advertising will be flipped upside down and much more.”
“MidJourney V6 can replicate almost any animation style.” Many reactions commenting on the copyright issues of doing so.
Heather Cooper shows how you can tell mini-stories with Midjourney v6 “we can now present a narrative across multiple panels” comic book style.
Microsoft Copilot now is incorporating SunoAI onto their platform:
“Through this partnership, people will have at their fingertips the ability, regardless of musical background, to create fun, clever, and personalized songs with a simple prompt.”
Stability AI announced Stable Video Diffusion Now Available on Stability AI Developer Platform API. “the model can generate 2 seconds of video, comprising of 25 generated frames and 24 frames of FILM interpolation, within an average time of 41 seconds.”
In a minor update, ChatGPT will let you archive your conversations.
Top Tools & Hacks
Suno AI is out of stealth, they released their app, and their website is live. You can use their app to generate songs in a specified genre on a topic and it will generate the lyrics and the soundtrack. In just a few minutes, I was able to make some interesting music. I’ll say it before others do: Suno AI is the Midjourney of music.
AI Research News
Google posted their Gemini Technical Report on Arxiv as “Gemini: A Family of Highly Capable Multimodal Models” and it literally has 900 authors. Even though it is 62 pages long, a critic notes that it spends only half a page on the training set.
Google's VideoPoet is the most impressive AI video generation model yet. Google researchers shared their work in a VideoPoet blog post showing off the models abilities, and they also published the paper “VideoPoet: A Large Language Model for Zero-Shot Video Generation.”
In the VideoPoet model, all modalities are tokenized, in contrast to diffusion-based video generation models. They note: “Because our training spans videos, images, and text, we can prompt our model to demonstrate many aspects of understanding about the world including 3D structures, camera motions, and visual styles learned from these different sources.”
Apple wants AI to run directly on its hardware instead of in the cloud, and they have produced a research to make it easier to do just that, in the paper “LLM in a flash: Efficient Large Language Model Inference with Limited Memory.” Using techniques for managing how flash memory and optimizing how LLMs use memory, they greatly speed up LLM inference on limited memory devices. This enables larger and more powerful AI models to be run directly on your smartphone or other edge devices.
From Google DeepMind, the paper Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models presents an approach to improve reasoning for LLMs with automated (as opposed to human) feedback:
we investigate a simple self-training method based on expectation-maximization, which we call ReSTEM, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. … ReSTEM scales favorably with model size and significantly surpasses fine-tuning only on human data.
AI Business and Policy
TechCruch profiles the Chinese AI startup DP Technology, providing AI for science that is now seeking to expand globally. DP Tech provides a set of tools to conduct scientific computing, applying AI to molecular simulations, helpful in drug discovery and other applications.
rather than scraping the internet and books to teach generative AI systems, Apple is looking for something a bit more controlled, reliable and legally watertight. According to four insiders, the company has reached out to various publishers offering multi-year licensing deals “worth at least $50 million” for the purposes of training AI on news articles.
Danish researchers created a machine-learning computer model, called life2vec and dubbed "the doom calculator," that can predict when people die with high accuracy. Have no fear, though, the tech billionaires are investing in extending our longevity.
UK Supreme Court rules AI is not an inventor. Similar to other court rulings, only human beings can hold patents. While AI cannot be named an inventor, it will surely be used to help us invent.
Japan MoD, US DoD sign a joint military agreement for AI research, centered on state-of-the-art AI for UAV (unmanned aerial vehicles).
We have an election prediction: AI will make 2024 US elections a ‘hot mess.’
Anthropic added legal protections for users of their AI: “Under the updated terms, we will defend our customers from any copyright infringement claim made against them for their authorized use of our services or their outputs.”
AI Opinions and Articles
Bill Gates prominently discussed AI in his “Road Ahead 2024” annual note. He points out that “AI is about to supercharge the innovation pipeline” and is encouraged by what AI can do to address the global health and education problems that makes up the bulk of his philanthropic work:
Can AI combat antibiotic resistance? … treat high-risk pregnancies? … assess their risk for HIV? Could AI make medical information easier to access for every health worker?
He gives examples where innovators are using AI to answer all these questions affirmatively, helping with medical innovations.
Sam Altman shares “what I wish someone had told me,” good advice for entrepreneurs and others on the journey of life. Food for thought for the New Year.
A Look Back …
Since Midjourney V6 is the Release of the Week, it’s worth making this week’s Look-Back all about how far it’s come. Midjourney V1 was released in February 2022, V2 in April 2022, V3 in July 2024, V4 in November 2022, V5 in March 2023, and V6 in December 2023. The progress from primitive to amazing has taken under two years.