AI Week in Review 23.08.06
AI Tech and Product Releases
Meta releases AudioCraft, a generative AI tool for music and audio. This is a framework of tools - MusicGen, AudioGen and EnCodec - they are “open-sourcing for research purposes to help advance the field of AI-generated audio.” They are releasing training code, enabling personalized training of models on users’ own music datasets. As TechCrunch notes, “But while the MusicGen terms of use discourage using the model for “out-of-scope” use cases beyond research, Meta doesn’t expressly prohibit any commercial applications.” This could become a ‘stable diffusion for music.’
OpenAI announces GPT updates on Twitter “to improved the ChatGPT experience”: Prompt examples; Suggested replies; GPT-4 is default; upload multiple files to Code Interpreter; stay logged in; keyboard shortcuts.
Google Assistant to get an AI makeover. Google is reorganizing teams and jobs to focus on generative AI for its assistants.
Seeking to stay ahead of getting superceded by AI, Stack Overflow is adding AI capabilities, which includes better semantic search, knowledge repository, a GenAI slack integration, and more.
StabilityAI has released Stable Beluga 1 and Stable Beluga 2, based on Llama-65B and Llama2-70B respectively, and fine-tuned using the same methods as Orca. StableBeluga2 is near the top of the HuggingFaceLeaderboard, coming close to ChatGPT in performance benchmarks.
AI Research News
Nvidia’s generative image personalization AI named Perfusion is a tiny model that takes 4 minutes to train, and produces image modifications based on text prompts that match diffusion methods in quality.
In ToolLLM: Facilitating Large Language Models to Master Real-world APIs, researchers improved tool using capabilities for open-source LLMs. They present ToolBench, an instruction-tuning dataset for tool use based on 16,464 real-world RESTful APIs, created a fine-tuned version of Llama called ToolLlama and showed its capabilities.
RT-2: Intelligent robot model translates vision and language into action. Developed at DeepMind, RT-2 is a general use robot that can accept visual and text input and uses this AI to become aware of and take action on its surroundings. In their RT-2 paper, they combine language and vision into a vision-language-action models (VLA) that give RT-2 its understanding of its surroundings, and express robot actions as text output. “Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.”
A Nature paper on Emergent analogical reasoning in large language models compared GPT-3 and GPT-4 versus humans on analogy reasoning tasks, and “found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings; preliminary tests of GPT-4 indicated even better performance.” See also Ars Technica on GPT-3 aces tests of reasoning by analogy.
There’s more to intelligence than reasoning IQ, so what about EQ? A recent study found that ChatGPT is better than humans at accurately identifying emotions in fictional textual scenarios. ChatGPT’s emotional awareness could be helpful for implementing AI in mental health.
LLMs, like humans, respond to emotional cues in prompts. A study found that feeding positive emotional stimuli to LLMs improves their performance. Researchers evaluated adding to LLM prompts 11 emotional motivating phrases, such as "this is very important for my career," "you'd better be sure," "take pride in your work and give it your best", and "embrace challenges as opportunities for growth." They report:
"Experimental results demonstrate that our EmotionPrompt, using the same single prompt templates, significantly outperforms original zero-shot prompt and Zero-shot-CoT on eight tasks with diverse models: ChatGPT, Vicuna-13b, Bloom, and T5. Further, EmotionPrompt was observed to improve both truthfulness and informativeness."
LLMs have gotten too good on prior benchmarks of models, so a paper proposes ARB: Advanced Reasoning Benchmark for Large Language Models:
“ARB presents a more challenging test than prior benchmarks, featuring problems in mathematics, physics, biology, chemistry, and law. As a subset of ARB, we introduce a challenging set of math and physics problems which require advanced symbolic reasoning and domain knowledge. We evaluate recent models such as GPT-4 and Claude on ARB and demonstrate that current models score well below 50% on more demanding tasks.”
Speaking of Stable Diffusion for music, how about a tool called GETMusic: A Unified Representation and Diffusion Framework that can Generate Music Tracks. “GETMusic uses a representation called GETScore and a discrete diffusion model called GETDiff. GETScore represents tracks in a 2D structure where tracks are stacked vertically and progress horizontally with time.”
MIT is researching more efficient AI model architectures that allow for feedback called Liquid Neural Networks, that can solve AI problems from robotics to self-driving cars. They increase expressivity of the neural network with recurrent feedbacks within deep learning layers and non-linearities over the synaptic inputs, more like how our brain neurons work.
One of the most striking features of LNNs is their compactness. For example, a classic deep neural network requires around 100,000 artificial neurons and half a million parameters to perform a task such as keeping a car in its lane. In contrast, Rus and her colleagues were able to train an LNN to accomplish the same task with just 19 neurons.
This is a massive step-change in capabilities, and if it can be applied broadly, could be a profound leap forward in deep learning efficiency.
AI Business and Policy
While LLMs have taken most of the limelight this year, strides in robotics isn’t far behind. Robotic stocks are jumping, as billionaire Behind Walmart’s Warehouse Robots Gains More Than $7 Billion In A Day. His company, Symbotic, has gone up over 400% this year, and investment gurus predict “AI Is Unlocking Major Wealth Potential for Robotics Stocks.”
As AI becomes better at interpreting language, and robots become better at interacting with the physical world, fears about replacement rise
As the RT-2 research (mentioned earlier) shows, better foundational AI models will make robots more intelligent, flexible, and capable, so the utility of robots in multiple roles will greatly expand. Robotics is predicted to grow 30% annualized now through 2030, and it could end up being more profoundly important in displacing human work than the LLMs themselves.
AI cloud cloud startup CoreWeave raises $2.3 billion in debt collateralized by Nvidia chips. They are getting big thanks to NVidia being in their corner. As VentureBeat puts it, CoreWeave came 'out of nowhere.' Now it's poised to make billions off AI with its GPU cloud.
Dell is cooperating with Nvidia, announcing 'Project Helix' with Nvidia to provide generative AI solutions.
How are companies training workers on ChatGPT, generative AI? AI evolving too fast to train workers to keep up, as the scene is constantly shifting.
How quickly could AI threaten jobs in the medical field? Gottlieb says AI may take on doctors' roles sooner rather than later.
AI Opinions and Articles
Training AI off data scraped online also includes a lot of toxic speech, Google is coming up with ways to fight that with Google’s weird, risky quest to tame AI with AI.
A Look Back …
The Atlantic has a profile on Sam Altman of OpenAI, and asks “Does Sam Altman know what he’s creating?” Sub-titled “The OpenAI CEO’s ambitious, ingenious, terrifying quest to create a new form of intelligence,” the article goes through the history of OpenAI and it’s discouraging early years:
The first few years at OpenAI were a slog, in part because no one there knew whether they were training a baby or pursuing a spectacularly expensive dead end.
… “Nothing was working, and Google had everything: all the talent, all the people, all the money,” Altman told me. The founders had put up millions of dollars to start the company, and failure seemed like a real possibility. Greg Brockman, the 35-year-old president, told me that in 2017, he was so discouraged that he started lifting weights as a compensatory measure. He wasn’t sure that OpenAI was going to survive the year, he said, and he wanted “to have something to show for my time.”
… and then the Transformer paper came out. The rest is AI history.