AI Week In Review 23.12.16
Google Imagen2, MidJourney Alpha, MusicFX, Phy-2, FunSearch solves a math problem, Tesla Optimus Gen 2 demo, Stability AI StableZero123.
AI Tech and Product Releases
Midjourney Alpha web interface is available here to super-users who generated over 10k Midjourney images. This will be a big UX upgrade for Midjourney users over the discord text prompt method.
Google has released Imagen2 text-to-image generation for Vertex customers and it’s looking very impressive. Imagen2 offers accurate text and logo generation along with high-quality photorealistic image output, and it also supports captions and visual QA on images. Google’s Vertex provides API access only, not front-end access via Google, but it’s also coming to Shutterstock, Canva and other AI applications that will use Imagen2 under the hood.
As promised in last weeks Google Gemini announcement, Google’s Gemini Pro now available to developers via Google AI Studio and Vertex AI.
Google’s new AI tool ‘MusicFX’ composes music with just a few words. You can try out MusicFX here in their AI test kitchen. This is a new leap in text-to-music generation, and incorporates water-marking to establish origin of created music.
Stability AI released Stable Zero123 for Quality 3D Object Generation from Single Images. This upgrades from prior state-of-the-art models (Zero123-XL).
Stability AI also announced Stability AI Membership to standardize commercial use of their models by professional and enterprise users, while keeping their core AI models open and free for research and personal use.
Tesla’s Optimus Gen 2 demo video has gotten a lot of attention and went viral, touting Optimus Gen 2 and its improvements in faster walking, faster and more capable hands, lower weight, articulated neck, and more. The impressive robotic egg-handling made it to our cover art. It’s not the only robot with impressive demo videos, here are 13 robots powered by AI and their insane demo videos.
In time for tax season, H&R Block launches AI tax filing assistant. This AI assistant is designed to answer complex tax questions and was built to only use specific H&R Block content. “H&R Block, which does still operate brick-and-mortar tax services, hopes that AI Tax Assist feels a lot like speaking to one of its human accountants.”
Microsoft has released Phy-2 to their Azure platform. Phi-2, the small but mighty 2B model, shows the surprising power of small language models. It beats larger capable models like Mistral 7B and Llama 70B on key benchmarks.
Top Tools & Hacks
Prompting hacks for ChatGPT: Consider the time of year. Rob Lynch on X found that GPT-4 turbo has a seasonal behavior. “gpt-4-turbo over the API produces (statistically significant) shorter completions when it "thinks" its December vs. when it thinks its May.”
Google’s NotebookLM just added more than a dozen new features. The more I read about NotebookLM, the more I want to try it - Evernote meets Notion and goes on AI steroids.
AI Research News
This week the conference NeurIPS 2023 took place. It’s one of the top AI conferences, with thousands of attendees and papers presented. This Guide to NeurIPS 2023 — 7 Research Areas and 10 Spotlight Papers to Read does a good job summarizing key topics and papers.
We mentioned in our prior article “Fine-Tuning LLMs with Direct Preference Optimization” that the paper “Direct Preference Optimization: Your Language Model is Secretly a Reward Model” was recognized in the NeurIPS 2023 Paper Awards.
Another award-winning NeurIPS paper is “Scaling Data-Constrained Language Models,” which guides on appropriate scaling of compute and parameters when data is limited. They show that using a dataset 4 times (4 epochs) in LLM pre-training has similar pre-training results to unique data.
System 2 Attention (is something you might need too) is a way to guide LLMs in attending to what is important in a context versus what is irrelevant. This method regenerates the input context to only include the relevant portions, then queries on the regenerated context to elicit the final response. As a result, it outperforms the baseline with better factuality and objectivity.
The Nature article DeepMind AI outdoes human mathematicians on unsolved problem, explores an AI algorithm called FunSearch (which stands for functional search) and how it solved a mathematics problem called Cap Set problem.
The paper from Deep Mind researchers, “Mathematical discoveries from program search with large language models,” explains the technical details. The key insight is that many problems in mathematics are easy to evaluate, despite being hard to solve. It’s easier to check a proof then invent one. This observation leads to a powerful way to attack such complex problems: Generate proposed solutions; use an efficient evaluator to provide detailed feedback; iteratively improve on components of generated solutions based on feedback to solve the problem.
Deedy on X notes the FunSearch connection with RL algorithms and other DeepMind models:
[FunSearch is] not too dissimilar to how AlphaCode works on competitive coding problems to achieve about median human performance on codeforces with a 41B param model. … If fast enough, this may be able to find novel solutions to problems humans have NEVER solved as long as … we can measure progress and verify the answer.
This algorithmic paradigm could be applicable to a host of math and science problems.
OpenAI presented a way to control Superintelligent AI, with more AI. They used a weak model GPT-2 to command the much more powerful GPT-4, and studied the effects to understand how we might interact with a super-intelligent AI. OpenAI also announced Superalignment Fast Grants for researchers to study this issue.
AI Business and Policy
OpenAI launched Converge2, a pre-seed fund and accelerator for AI startups.
There have been rumors of OpenAI releasing GPT-4.5 soon. This was however denied on X by Sam Altman.
Researchers say Microsoft Bing / Copilot made up facts about European elections. In queries about European elections earlier this year, Bing managed to get only 30% of answers factually correct, evaded answering 39%, and had factual errors 31% of the time.
Bing’s responses included fake controversies, wrong election dates, incorrect polling numbers, and, at some points, candidates who weren’t running in these elections. These error-ridden responses made up 31 percent of the answers.
For a tool powered by GPT-4 and backed by search retrieval, that’s a bleak result.
Here comes robot-made fast food: CaliExpress in Pasadena touted as world's first fully autonomous, AI-powered restaurant. Order at an AI-driven automated kiosk, and hamburger-making and French fry-making robots will make the order. The restaurant will still employ people “to assemble, package (the food) and actually be the friendly face to the customer.”
Recent data shows AI job losses are rising, but the numbers don’t tell the full story. A ResumeBuilder survey revealed that more than one-third of business leaders say AI replaced workers in 2023, and 44% expect layoffs in 2024 due to AI efficiency. However, the impact of AI on jobs overall is mixed, as AI also enables businesses to restructure and redefine jobs:
“While positions like research and data analysis are in line for AI automation, companies will still need someone to prompt the AI, make sense of the results and take action.”
In a survey by collaboration software company Asana, employees said that 29% of their work tasks are replaceable by AI. So the positive approach on this is not that AI will take 30% of jobs, but that AI will take 30% of any job, freeing workers up to be more productive.
Think tank tied to tech billionaires played key role in Biden’s AI order. This article ties the Biden order to influence from RAND Corporation researchers funded by Open Philanthropy, a group financed by a Facebook co-founder Asana CEO Dustin Moskovitz, that is putting a lot of grant money into the “Potential Risks from Advanced Artificial Intelligence.” These researchers “played a key role in drafting President Joe Biden’s new executive order on artificial intelligence.”
News publisher files class action antitrust suit against Google, citing AI’s harms to their bottom line. The lawsuit argues that Google is siphoning off publisher’s content through their AI and search technologies. What happens to the web when nobody needs to visit sites anymore?
“When online magazine The Atlantic modeled what would happen if Google integrated AI into search, it found that 75% of the time the AI would answer the user’s query without requiring a click-through to its website, losing it traffic.”
If you can’t beat them, join them. This week, OpenAI and Axel Springer announced a Partnership to deepen beneficial use of AI in journalism. OpenAI stated the initiative “explicitly values the publisher’s role in contributing to OpenAI’s products.”
The Verge reported that the company behind TikTok ByteDance is secretly using OpenAI’s GPT to build a competing AI model. In a followup, OpenAI confirmed that they suspended ByteDance’s account.
A Global AI conclave was held in Bengaluru, India, where AI leaders shared perspectives. Andrew Ng stated “India has the highest AI skill penetration in the world, even more than the US.”
AI Opinions and Articles
Meta's Chief AI Scientist Yann LeCun is the news, advocating for transparency, open sourcing in AI development. His positions and his advocacy is healthy for public perception of AI, by tamping down unwarranted fears of AI-induced doom, calling out risks of closed AI monopolies, and advocating for open source and open research.
If you imagine this kind of future where all of our information diet is mediated by those AI systems, you do not want those things to be controlled by a small number of companies on the West Coast of the U.S. Those systems will constitute the repository of all human knowledge and culture. You can't have that centralized. It needs to be open. - Yan LeCun
This Robotics Q&A with UC Berkeley’s Ken Goldberg shares a lot of great insight on the current state of robotics. He says “2023 will be remembered as the year when generative AI transformed Robotics,” and multi-modal AI models are exciting and impactful, but doesn’t expect true AGI and general-purpose robots soon.
He is impressed by some of the humanoid bipedal robots under development, and he does expect “within the next decade we will have affordable home robots” to do housework. He also mentions robots for “Augmented Dexterity” in surgery, where “robots can enhance surgical skills by performing low-level subtasks such as suturing.”
A Look Back …
Time is getting so compressed, that today’s ‘look back’ is Sam Altman, now Time magazine’s CEO of the Year, looking back at what happened at OpenAI just a few weeks ago. We still don’t know why OpenAI’s board fired Altman exactly, but his quick comeback from that reversal of fortune has only made him and the company stronger.
"As we get closer and closer to superintelligence, everybody involved gets more stressed and more anxious and we realized the stakes are higher and higher. And I think that all exploded." - Sam Altman, OpenAI CEO