Discussion about this post

User's avatar
Tedd Hadley's avatar

The third path is the way:

> MultiOn’s recently announced AgentQ uses MCTS (Monte-Carlo Tree search), self-critique, and DPO-based fine-tuning to enhance AI reasoning. This is similar to the approach for AI reasoning in “Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing.”

Thank for spotting AgentQ last week, Patrick. It's disappointing that there's only two in this category, and no open-source. But I believe this third step is key. In trying to solve a task and traversing the search trees towards a solution, a model has to learn from the process-- just the way we would. Parameters have to be tweaked each time absorbing the tricky aspects of every success and failure.

It's sobering to think that true reasoning is going to need real-time training, inference won't be good enough. Won't that upend a lot of AI cloud strategies...

Expand full comment
1 more comment...

No posts