Chengshuo Dai

When interacting with Large Language Models (LLMs), it's easy to treat them as black-box oracles: you ask a question, and they spit out an answer. But as tasks become more complex—especially those involving math, logic, or multi-step reasoning—this direct approach often fails. The model hallucinates or skips crucial logical steps.

The breakthrough that changed how we elicit reasoning from LLMs is Chain of Thought (CoT) prompting. It's a deceptively simple technique that has profound implications for how these models process information.

How CoT Works

At its core, Chain of Thought is about forcing the model to generate intermediate reasoning steps before arriving at a final answer. Instead of a prompt like: Q: If John has 5 apples and gives 2 to Mary, how many does he have left? A: 3.

A CoT prompt looks like this: Q: If John has 5 apples and gives 2 to Mary, how many does he have left? A: John starts with 5 apples. He gives 2 away. 5 - 2 = 3. The answer is 3.

By providing a few examples of this step-by-step reasoning (few-shot CoT), or simply appending "Let's think step by step" to the prompt (zero-shot CoT), the model is guided to break down the problem. This intermediate generation acts as a form of "scratchpad" memory. Because autoregressive models generate tokens one by one, producing the reasoning steps gives the model more computational "time" (more forward passes) to arrive at the correct conclusion.

Personal Reflection

What fascinates me most about Chain of Thought is how closely it mimics human cognitive processes. When faced with a complex math problem, I don't just instantly know the answer; I write down the intermediate steps. CoT forces the LLM to do the same.

However, working with CoT has also taught me its limitations. It's not a magic wand that imbues the model with true understanding. If the model lacks the fundamental knowledge or logical capability (e.g., if it's too small), CoT can sometimes lead it down a rabbit hole of confident but entirely flawed reasoning. It's a reminder that while we can guide the model's generation path, the underlying representations still dictate the ceiling of its capabilities. It also highlights the importance of prompt engineering as a bridge between human intent and machine execution.

Reference:

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models