Nov 25, 2023·edited Nov 25, 2023Liked by Patrick McGuinness
> What Q* is likely to be is another incremental step towards better AI reasoning - significant perhaps but partial.
It's probably wise to downplay this given the hype, but it's hard NOT to see Q* resulting in extremely rapid intelligence improvements in the short-term. You may have already seen Nathan Lambert's writeup at https://www.interconnects.ai/p/q-star , but Q* evokes to me AlphaGo's successful superhuman trajectory enabled by search and self-play with RL.
Thanks for the compliment. my essay was getting too long for a simple high-level exposition but there's a lot more depth you can go into on technical details and Lambert's essay explained technical details well.
I also feel somewhat confirmed in the thesis that Q* was the next step after "Let's verify step-by-step" by not just that writeup, but by the "AI Explained" youtube - https://www.youtube.com/watch?v=ARf0WyFau0A - that double-clicked on that paper as the immediate predecessor. It also introduced some other threads, such as "test-time-compute", what OpenAI's Noam Brown was tweeting on X, and Lukasz Kaiser talking about CoT.
I agree that Q* could accelerate the path to AGI. I am already in the camp that AGI will require advances beyond merely LLM scaling, which taps out on reasoning.
The main advances needed to get to human level reasoning points to RL as the base algorithm, as it has been for most problem-solving AIs, and Chain-of-thought type of verification/iteration loop on LLMs. So, yes, we may look back on Q* as one of the key breakthroughs that gets us to AGI. Time will tell.
> What Q* is likely to be is another incremental step towards better AI reasoning - significant perhaps but partial.
It's probably wise to downplay this given the hype, but it's hard NOT to see Q* resulting in extremely rapid intelligence improvements in the short-term. You may have already seen Nathan Lambert's writeup at https://www.interconnects.ai/p/q-star , but Q* evokes to me AlphaGo's successful superhuman trajectory enabled by search and self-play with RL.
Excellent writeup!
Thanks for the compliment. my essay was getting too long for a simple high-level exposition but there's a lot more depth you can go into on technical details and Lambert's essay explained technical details well.
I also feel somewhat confirmed in the thesis that Q* was the next step after "Let's verify step-by-step" by not just that writeup, but by the "AI Explained" youtube - https://www.youtube.com/watch?v=ARf0WyFau0A - that double-clicked on that paper as the immediate predecessor. It also introduced some other threads, such as "test-time-compute", what OpenAI's Noam Brown was tweeting on X, and Lukasz Kaiser talking about CoT.
I agree that Q* could accelerate the path to AGI. I am already in the camp that AGI will require advances beyond merely LLM scaling, which taps out on reasoning.
The main advances needed to get to human level reasoning points to RL as the base algorithm, as it has been for most problem-solving AIs, and Chain-of-thought type of verification/iteration loop on LLMs. So, yes, we may look back on Q* as one of the key breakthroughs that gets us to AGI. Time will tell.