AI Week In Review 24.10.12
AI pioneers Hopfield, Hinton, Hassabis, and Jumper win Nobel prizes, Tesla demos Robotaxi and Robovan, Zoom avatars, OpenAI Swarm, PyramidFlow video gen, Reka Flash 21B update, Aria MoE, MLE-Bench.
Highlight – Nobel Prizes Awarded to AI Pioneers This Week
John Hopfield and Geoffrey Hinton won the Nobel Prize in Physics for their foundational work on artificial neural networks. Geoffrey Hinton accepted the Nobel Prize in a press conference, and acknowledged Yoshua Bengio, Yann LeCun, and other AI pioneers, including his mentors and students. Hinton also praised his former student Ilya Sutskever for helping to fire OpenAI’s CEO Sam Altman, viewing it as a win for AI safety.
DeepMind CEO Demis Hassabis and colleague John Jumper won a Nobel prize in Chemistry for their development of AlphaFold, an AI system that predicts protein structures. They share one-half of the Nobel Prize in Chemistry for 2024, with David Baker, head of the Institute for Protein Design at the University of Washington, securing the other half. The award recognizes their contributions to protein structure prediction and computational protein design, respectively.
AI Tech and Product Releases
Tesla unveiled a fully autonomous Robotaxi and Robovan at the Tesla “We, Robot” Event. Tesla CEO Elon Musk showcased a lineup of autonomous Robotaxis at the event, held on a Hollywood movie set, announcing they will cost under $30,000 and be in production by 2026.
Tesla also unveiled the Robovan prototype, an autonomous EV bus with a futuristic Art Deco design that can carry up to 20 people. The unveiling also included a dozen humanoid Optimus robots demonstrating various functions and interactions. There’s a recap of details here. Some critics panned the vague promises and Tesla stock fell, perhaps sensing no clear timelines.
Zoom wants to turn you into an AI-animated, photorealistic avatar. Zoom announced an upcoming feature that lets users create a digital clone avatar of themselves from a video clip - complete with a head, upper arms, and shoulders. Users can tell their digital double what to say with a script, and Zoom will generate synced audio-video. While this aims to enhance asynchronous communication, it also raises deepfake risks. Zoom promises to implement safeguards including watermarks and authentication methods.
Amazon announced new AI-powered package retrieval technology is being added to its electric vehicle fleet. The Vision-Assisted Package Retrieval (VAPR) highlights packages for delivery at the current stop with visual and audio cues, reducing driver handling time. Set to arrive on 1,000 Rivian vans by early 2025, this technology is designed to streamline package retrieval and delivery processes.
Meta is bringing its AI chatbot Meta AI to more countries today, including Brazil, the U.K., and the Philippines. The company plans a gradual rollout to more countries, including the Middle East, making it available in 43 countries and over a dozen languages. Meta AI will support additional languages like Arabic, Indonesian, Thai, and Vietnamese by the end of this release cycle. Notably, Meta has not included Europe in its expansion plans, likely due to the region’s stringent regulatory challenges.
Google Gemini users world-wide now have access to Imagen 3, the latest iteration of their AI image generation model.
OpenAI has released an open-Source multi-Agent framework called Swarm, described as “an educational framework exploring ergonomic, lightweight multi-agent orchestration.” This was done to demonstrate ways of orchestrating AI Agents with routines and handoffs.
Reka announced updates to its Flash 21B model. Reka Flash supports interleaved multimodal inputs (text, image, video, audio), and its new release is significantly improved at function-calling, instruction following, multi-lingual support, and multimodality, as shown by improved benchmarks.
Pyramid Flow, an open-source high-quality AI video generator, has launched. Pyramid Flow leverages a new technique wherein a single AI model generates video in stages, most of them low resolution, upscaling to high-resolution at the end of its generation process. This is an incredible, SOTA video generation AI model, but they stand out from competitors by being fully open source, with an MIT license. Pyramid Flow team has shared their Paper, code on GitHub, and models on HuggingFace.
Cartesia Sonic has added 8 languages to Sonic-Multilingual, giving low-latency, ultra-realistic voice generation for 14 total languages now.
Rhymes AI has released Aria, a 25.3B multimodal MoE model. Aria was pre-trained from scratch on 6.4T tokens of multimodal and language data and uses a fine-grained mixture-of-expert (MoE) model with 3.9B activated parameters per token. It boasts impressive performance on multimodal and language tasks, beating GPT-4o mini on many benchmarks. The model is available on HuggingFace.
Maitrix and LLM360 launched a new Decentralized Arena:
We release Decentralized Arena that automates and scales “Chatbot Arena” for LLM evaluation across various fine-grained dimensions (e.g., math – algebra, geometry, probability; logical reasoning, social reasoning, biology, chemistry, …).
The new Leaderboard is on HuggingFace.
AI Research News
OpenAI released MLE-Bench, a new Kaggle-focused benchmark that measures how well AI agents perform at machine learning (ML) engineering tasks. They released a companion paper “MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering.” The upshot is AI systems aren’t doing so well at these tasks, yet.
Apple researchers published the paper “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models” that evaluated LLMs reasoning capabilities and calls LLM reasoning into question. They found that adding irrelevant clauses in questions can drastically reduce performance of LLMs on reasoning tasks:
We investigate the fragility of mathematical reasoning in these models and show that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data.
This work has generated a lot of feedback, response, and commentary, e.g., Gary Marcus, with rather divided opinions.
I don't recall another time in CS history where a result like this is obvious to the point of banality to one group. And heretical to another. – Martin Casado
This week, our Research Roundup covers the topic of LLM inference, memory, context, and Retrieval-Augmented Generation (RAG). These papers present various methods to improve how LLMs access and use relevant context to generate higher quality output:
Archon: An Architecture Search Framework for Inference-Time Techniques
Differential Transformer
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models
In Defense of RAG in the Era of Long-Context Language Models
StructuredRAG: JSON Response Formatting with Large Language Models
AI Business and Policy
AMD announced the Instinct MI325X AI accelerator chip, aiming to challenge Nvidia's dominance in AI chips. The153 billion transistor MI325X accelerator sets new AI performance standards. It is built on AMD’s CDNA3 GPU architecture and delivers up to 2.61 PFLOPs of FP8 performance, competitive with Nvidia’s H200. AMD also announced the 5th Gen EPYC CPUs, calling them the best for enterprise, AI, and cloud computing.
Walmart is testing a new retail-focused LLM named Wallaby they built themselves to enhance its retail AI capabilities. It’s being tested internally first, and it will be exposed to customers over the next year to improve customer experience and operational efficiency. Walmart has been using AI technology for years in both customer interactions and managing its supply chain.
Inflection AI is alive and kicking and has pivoted to providing AI for enterprise.
Amazon announced plans for new robot-powered delivery warehouses that will use ten times more robots than before. The first “next-generation fulfillment center” facility in Shreveport, Louisiana, is built from the ground up for automation, and will use advanced Sequoia robotics systems to store and pick goods more efficiently. Despite increasing the use of robots, the fulfillment center will still employ 2,500 people once operational.
London startup Basecamp Research raised $60 million for an AI that provides unprecedented insights into biology and biodiversity. The company’s foundational model, BaseFold, outperforms DeepMind's AlphaFold 2 in predicting complex protein structures, leveraging extensive primary data collected worldwide.
AI accounting software company Numeric has raised $28 million in a Series A funding round. The company's AI accounting software automates parts of the book-closing process, reducing the days needed for monthly closings and attracting customers like Brex and OpenAI.
A TechCrunch run-down on VC funding of AI startups shows venture investors remain enthusiastic about AI. In Q3, AI companies raised $18.9 billion, accounting for 28% of all venture funding. Notably, OpenAI secured the largest-ever venture deal at $6.6 billion, and six other AI deals exceeded $1 billion in 2024.
A reporter used a free tool called AI Hawk to apply for over 2,800 jobs in just an hour by automating résumés and cover letters. Both the application process and hiring are becoming increasingly reliant on AI, disrupting traditional recruitment methods.
AI Opinions and Articles
The State of AI 2024 was released by Nathan Benaich and Air Street Capital. This massive report consists of 212 slides and covers many aspects of AI, include research on AI models, industry, politics, safety, and future trends. They reviewed last year’s predictions and made new ones.
Hold off on long-range plans. Max Tegmark says crazy things will happen due to AI in the next two years, so don’t bother planning 10 years into the future. Stay nimble. Although there is a lot of hype, it’s like dot-com hype. He adds, “The technology is here to stay and it’s going to blow our minds".
Should AI weapons be allowed to decide to kill? Brandon Tseng of Shield AI asserted in late September that the U.S. would never develop fully autonomous weapons, but five days later, Anduril co-founder and CEO Palmer Luckey expressed openness to such technologies, arguing against moral arguments against them. We still want and need humans in the loop to stay in control, but what if automation yields a military advantage?
“[Our adversaries] use phrases that sound really good in a sound bite: ‘Well can’t you agree that a robot should never be able to decide who lives and dies? And my point to them is, where’s the moral high ground in a landmine that can’t tell the difference between a school bus full of kids and a Russian tank? – Anduril CEO Parmer Luckey
Cool. This sucks. All this development and I'm stuck trying to find peer review for my secret mad scientist project.
Thanks for the update