Chengshuo Dai

The development of Autonomous Agents powered by Large Language Models (LLMs) has moved rapidly from simple, single-step tool execution to complex, multi-step reasoning frameworks. The foundational paradigm for these agents is ReAct (Reasoning and Acting), where the model iteratively generates a thought, selects an action (tool), observes the result, and repeats the process until a final answer is reached. While ReAct is highly effective for straightforward tasks, it often struggles with complex, long-horizon problems that require strategic planning and the ability to recover from errors. When faced with a multi-step math problem or a complex coding task, a standard ReAct agent might get stuck in a loop, repeatedly calling the same tool with the same parameters, or it might lose track of the overarching goal amidst the intermediate steps.

To overcome these limitations, researchers have introduced the Plan-and-Solve (PS) prompting strategy. The core idea behind PS is to explicitly separate the planning phase from the execution phase. Instead of immediately jumping into action, the agent is first instructed to analyze the problem and generate a comprehensive, step-by-step plan. This plan acts as a roadmap, breaking down a complex, monolithic task into a series of smaller, manageable subtasks. For example, if the user asks the agent to "Analyze the financial performance of Apple over the last five years and write a summary report," the PS agent will first output a plan: "Step 1: Search for Apple's annual revenue from 2019 to 2023. Step 2: Search for Apple's net income for the same period. Step 3: Calculate the year-over-year growth rates. Step 4: Draft the summary report based on the collected data."

Once the plan is generated, the agent enters the execution phase, systematically addressing each subtask in order. This structured approach significantly reduces the likelihood of the agent getting lost or distracted, as it always has a clear reference point (the plan) to guide its next action. Furthermore, PS prompting often incorporates explicit instructions for the agent to extract relevant variables and calculate intermediate results, further enhancing its performance on complex reasoning tasks.

However, even with a solid plan, agents inevitably make mistakes. An API call might fail, a search query might return irrelevant results, or the agent might generate syntactically incorrect code. This is where Reflexion comes into play. Reflexion is a linguistic feedback framework that endows agents with the crucial ability to self-reflect and self-correct. It operates on the principle that learning from past failures is essential for robust problem-solving.

In a Reflexion loop, the agent attempts a task (e.g., writing a Python function to solve a specific problem). The generated code is then executed in a sandbox environment, and the output (success, failure, or error message) is recorded. If the execution fails, the agent is prompted to analyze the error message, review its previous code, and generate a verbal reflection explaining why the failure occurred and how it can be fixed. This reflection is then appended to the agent's working memory (context window) for the next attempt.

For instance, the agent might reflect: "The previous code failed with an IndexError because I did not account for the edge case where the input list is empty. In the next attempt, I must add a check for an empty list at the beginning of the function." Armed with this self-generated feedback, the agent attempts the task again. This iterative process of generation, evaluation, reflection, and correction dramatically improves the agent's success rate on complex coding and reasoning benchmarks, transforming it from a fragile script executor into a resilient, autonomous problem solver.

References:

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models - https://arxiv.org/abs/2305.04091
Reflexion: Language Agents with Verbal Reinforcement Learning - https://arxiv.org/abs/2303.11366