AI Week In Review 23.12.09
Google's Gemini, StripedHyena-7B, Grok, Mistral's 8x7B, Playground v2, Meta Imagine, Microsoft's Deep Search, EU gets AI Act together, AMD's MI300, Animate Anyone and the AI Alliance
AI Tech and Product Releases
This week’s biggest release was Google's Gemini 1.0, which we covered in depth in “Google's Gemini Launched”. In brief, native multi-modal Gemini comes in Nano, Pro and Ultra models, and their flagship Gemini Ultra claims SOTA capabilities, beating GPT-4V (but not by much) in most benchmarks. The caveat is that their flagship Ultra was announced and demonstrated in videos, but not actually released. Further, Google got flak for their demo video being deceptively edited.
You can get access to Gemini Pro in Google Bard now, and other features and models will roll out in coming weeks and months.
Elon Musk announced X.ai’s Grok is now available on X to Premium+ Subscribers. He then got community noted on X that it’s only available in the USA. What is Grok? Grok will tell you …
Playground released Playground version2, their latest and greatest image generation AI tool.
Mistral AI dropped a new mixture-of-experts 8x7B AI Model as a torrent link. Together released StripedHyena 7B based on a new architecture that may challenge transformers in LLMs. We discussed both in our recent article “Mistral AI drops Mixture of Expert AI Model.”
Stability AI released StableLM-Zephyr 3B, a small but powerful AI model fine-tuned for chat applications and small enough for running on edge devices.
Meta released a stand-alone AI image generation tool called Imagine, based on Meta’s Emu AI image generation model. This is a competitor to Dalle-3, Midjourney, and other AI image generation tools.
Microsoft introduced Deep Search, which uses AI to get to better user intent in search queries: “GPT-4 takes the search query and expands it into a more comprehensive description of what an ideal set of results should include.”
AMD released new chips - MI300 - to power faster AI training. AMD’s Instinct MI300X accelerator and the Instinct M1300A accelerated processing unit (APU) are used to train and run LLMs, and they are claiming performance comparable to H100 for training and even better for inference by 40%.
Apple has released MLX, a machine-learning training framework, for developers; it’s available on Github. MLX is intended to bring machine learning to Apple silicon.
Top Tools & Hacks
You can try out Meta’s Imagine at imagine.meta.com.
Recent AI model releases and upgrades give us free access to good flagship AI models. My top go-to LLMs right now are:
Google’s Gemini Pro at bard.google.com.
GPT-4 Turbo at copilot.microsoft.com.
GPT-4 Turbo and GPTs at openai.com, where I am a ChatGPT+ subscriber.
Perplexity at www.perplexity.ai.
Claude at claude.ai.
It’s interesting how far you can go without any paid subscription. For a time you needed a subscription to get to GPT-4, and so I’ve been a subscriber to get it, but now it’s on Microsoft CoPilot.
X wants me to become a Premium+ subscriber to access Grok, but I’ll pass on that, there are many other free options.
AI Research News
Artificial intelligence paves way for new medicines with research by Swiss collaborators at LMU, ETH Zurich, and Roche Pharma Research and Early Development (pRED) Basel. They developed a deep learning Graph Neural Network that predicts the optimal method for synthesizing drug molecules, and they reported their work in the paper “Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning.”
The technique of modeling molecular reactions with GNNs follows the approach that GNoME and others have used:
Graph neural networks (GNNs) have seen broad applications in molecular feature extraction and property prediction. Among the various machine learning methods developed for chemical reaction planning, GNNs have been successfully employed for retrosynthesis planning, regioselectivity prediction and reaction product prediction …
Their specific GNN training approach:
… different GNNs were trained on two-dimensional (2D), three-dimensional (3D) and atomic-partial-charge-augmented molecular graphs, to predict binary (yes/no) reaction outcomes, reaction yields and regioselectivity.
Along with the Gemini models, Google released Gemini1 and AlphaCode2 Technical Reports. However, both reports were heavy on benchmarks and demonstration examples and light on internal architecture details.
Alibaba's now 'Animate Anyone' model went viral this week because people saw potential to replace TikTok personalities. The model was released on github presented in the paper “Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation.” It takes an image and a stick-figure motion guidance video to generate a character animation video:
To preserve consistency of intricate appearance features from reference image, we design ReferenceNet to merge detail features via spatial attention. To ensure controllability and continuity, we introduce an efficient pose guider to direct character's movements and employ an effective temporal modeling approach to ensure smooth inter-frame transitions between video frames.
Its not surprising this model went viral. Put in yourself, a celebrity, a cartoon character - anyone - and you can make a TikTok-syle dancing video. The underlying methods suggest many image to video conversion applications.
Meta and Stanford present “Controllable Human-Object Interaction Synthesis,” a model that when given a task and initial human and object positions, generates a sequence of human motions to complete the task. This is a big step forward for motion instruction-following that has implications both for character animation in video and games and also for autonomous robots.
Dolphins is a Multimodal Language Model for Driving that includes human-like abilities as a conversational driving assistant. Tailored for the driving domain, and based on prior vision-language model work, Dolphins processes multimodal inputs comprising video (or image) data, text instructions, and historical control signals to generate informed outputs corresponding to provided instructions.
The result is Dolphins can “understand a variety of complex and long-tailed open-world driving scenarios and solve a spectrum of AV tasks.”
AI Business and Policy
Meta, IBM and over 50 other technology and AI firms and organizations announced an AI Alliance to Advance Open, Safe, Responsible AI: “The AI Alliance is focused on fostering an open community and enabling developers and researchers to accelerate responsible innovation in AI while ensuring scientific rigor, trust, safety, security, diversity and economic competitiveness.”
The firms on board this AI Alliance include many important companies in AI space, such as Intel, AMD, Hugging Face, Stability AI, and Meta itself, while excluding the top AI model makers (OpenAI, Google, Anthropic). It seems the divide is between the “open source” advocates and the builders of proprietary flagship AI models.
Aligned with this AI Alliance, Meta announced Purple Llama: Towards open trust and safety in the new world of generative AI. Purple Llama is a set of tools and evaluations for AI trust and safety.
“We believe it’s better when AI is developed openly — more people can access the benefits, build innovative products and work on safety. The AI Alliance brings together researchers, developers and companies to share tools and knowledge that can help us all make progress whether models are shared openly or not. We’re looking forward to working with partners to advance the state-of-the-art in AI and help everyone build responsibly.” - Nick Clegg, President, Global Affairs of Meta
Elon Musk's X.ai aims to raise $1 billion, and Axios says a new SEC filing suggests it is at least part of the way there, raising well over $100 million so far.
Wired reports that OpenAI Agreed to Buy $51 Million of AI Chips From a Startup Backed by CEO Sam Altman. Altman stands to gain from leading OpenAI indirectly through this business deal, hinting at a conflict of interest.
Europe reaches a deal on the world’s first comprehensive AI rules. “Negotiators from the European Parliament and the bloc’s 27 member countries overcame big differences on controversial points including generative AI and police use of face recognition surveillance to sign a tentative political agreement for the Artificial Intelligence Act.” They allowed some use of face recognition by law enforcement,
“Deal! The EU becomes the very first continent to set clear rules for the use of AI.” - EU Commissioner Thierry Breton
Doctors say AI is a 'game-changer' for number 1 killer of Americans. AI-based early detection of heart disease could help save lives. “AI is able to identify people at more than 90% risk of sudden death.”
‘Nudify’ Apps That Use AI to ‘Undress’ Women in Photos Are Soaring in Popularity. This misuse of deepfake AI image generation “runs into serious legal and ethical hurdles, as the images are often taken from social media and distributed without the consent, control or knowledge of the subject.”
AI Opinions and Articles
The latest AI won’t replace humans any time soon” article says something I’ve been saying for some time: “AI won’t replace humans, but people who can use it will.”
“There’s definitely going to be a difference between those that use AI and those that don’t,” Trevor Back, chief product officer at Speechmatics, said during Thursday’s panel discussion at the AI Summit New York 2023. “If you don’t use AI, you are going to struggle since most roles will use some form of AI in the way that they act,” he said.
Since AI is a productivity-enhancing tool for practically all intellectual work, either you multiply your productivity with AI, or you’ll get out-competed by those who do.
A Look Back …
Since IBM was in the news, it’s a reminder that although IBM is not a leader in AI right now, it was not long ago that they were a top pioneer in artificial intelligence.
In 1997, IBM's Deep Blue made history by defeating world chess champion Garry Kasparov.
In 2011, IBM's Watson made headlines worldwide by winning the TV game show "Jeopardy!," showcasing the remarkable advances in artificial intelligence and machine learning. Watson competed against human champions and excelled in processing and analyzing natural language, understanding complex questions, and rapidly retrieving accurate answers. This event marked a significant milestone in AI, demonstrating its potential beyond controlled environments.
However, Watson relied on bespoke pipelines for natural language processing (NLP) and reasoning. AI today is based on deep learning architectures.