TL;DR -The new paper “LLM+P: Empowering Large Language Models with Optimal Planning Proficiency” shows how LLMs can construct plans.
We have written in multiple articles about how Large Language Models (LLMs) can build on their strengths and overcome their weaknesses as AI models by being grounded with memory via vector databases, enabled with specific capabilities by accessing other tools via plug-ins, and being run iteratively to critique and revise their answers.
The recently published paper “LLM+P: Empowering Large Language Models with Optimal Planning Proficiency”1 addresses another weakness of LLMs: Planning.
Planning is an important aspect of being productive and effective in work and in life. If you want succeed at a task, you need to “Plan the work, then work the plan.”
As in human performance, so too with AI models. We cannot have useful AI do tasks for us or solve complex challenges if they cannot plan to solve for those challenges or order those tasks step by step.
While LLMs have great facility with language and can generate plausible text in many domains, they are weak on planning and complex, ordered reasoning tasks. As explained in the paper:
In the terms of Mahowald et al. [4] LLMs have become amazingly proficient at linguistic competence — knowing how to say things; but they are not nearly as good at functional competence — knowing what to say.
The classical planning problem in AI is to determine a sequence of actions that achieves a given goal. This is important in areas such as robotics. The use of formal methods has been used in past AI research to address the planning problem.
The planning domain definition language (PDDL) is a standardized encoding of classical planning problems and solutions. To solve the planning problem, researchers would write the planning problem in a formal language such as PDDL, and then solve the PDDL-desribed problem using theorem-proving techniques.
The insight of this paper is to recognize that since LLMs can read and write natural language, they can also code and decode PDDL, just as they might write and interpret other kinds of software languages:
LLMs are bad at planning (or long-horizon reasoning) [9] but they are good at describing and translating textual inputs, including re-writing planning prompts in the PDDL format. The intuition is that we can view PDDL as a different language than English, so re-writing planning prompts in PDDL is essentially a “machine translation” task that LLMs are known to excel at.
Their method is to call the planning function as an external module (they use FAST-DOWNWARD planner), and to have the LLM interface to the formal planner via PDDL. So the planner is another kind of plug-in, giving the LLM external planning capabilities.
The results are stunning. The LLM alone has almost no success in planning. The LLM+P module, with the appropriate context to give it guidance, produces an optimal plan for the majority of problems. The context given to the LLMs so it knows how to write a correct PDDL description is critical to making LLM+P work.
While stunning, the results shouldn't be surprising, given the capabilities of other plug-in tools. Wolfram Alpha plug-in can help chatGPT solve math problems, and a knowledge database plug-in can fix hallucinations in fact-based Q&A. Here we plug-in a good AI planning tool and we make the AI model planning-competent.
Why LLM + Planning Result is Important
Planning is a powerful and important meta- capability. Even if this particular paper or method ends up not being the best way to merge planning and LLMs together, the success shown here tells us we can get planning capabilities merged with LLMs.
Planning enables complex step-by-step actions, and breaking down task-solutions into sub-tasks. Planning as an LLM augmented capability opens up a whole vista of additional possibilities in how to better direct LLM-based agents to work autonomously and accomplish complex multi-step tasks.
The new agents that have been built on top of chatGPT and GPT-4, such as AutoGPT, need planning skills to not run wild or off course. Planning is the capstone capability to run these AI agents effectively.
The other important lesson here - and why I put C3PO as the image for this article - is that it is another example of using the LLM as a universal translator and interface. We can plug-in practically any software that can be controlled with text. Every linux / Unix command-line tool, and practically any software you can think of can be combined with LLMs.
LLM as the universal natural language interface / translator changes everything about software: AI can access every piece of software; every piece of software can be AI-enabled; every piece of software has an implicit natural language interface.
I find my day goes better if I say this mantra when I wake up:
“Carpe Diem. Seize the day, plan the day, work the day, live the day, love the day, win the day.”
I look forward to one day having my AI agent assistant telling itself the same as it figures out how to plan its day.
Liu, B., Jiang, Y., Zhang, X., Liu, Q., Zhang, S., Biswas, J., & Stone, P. (2023). LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:2304.11477.