A New Paradigm in AI
The o1 and o3 families of models represent a fundamental shift from standard LLMs. Instead of generating answers token-by-token immediately, they use reinforcement learning to generate a hidden Chain of Thought (CoT) before producing the final output.
How it works
When you give o1 a complex problem (like a math theorem or a massive refactoring task), it:
- Breaks the problem down into smaller steps.
- Tries different approaches.
- Recognizes its own mistakes and backtracks.
- Synthesizes a final, highly accurate answer.
Prompting Reasoning Models
Because these models reason internally, traditional prompt engineering techniques like "Think step by step" or few-shot prompting actually hurt performance.
- Keep it simple: State the problem directly. Do not tell it *how* to think.
- Provide edge cases: Give it constraints and edge cases to consider.
- Use Markdown: Structure the input clearly so the model understands the formatting of the problem.
Developer Note: The reasoning models do not support System prompts in the traditional sense; use the developer message role in the API instead of system.