GPT Image 2 replaces DALL-E 3 as OpenAI's premier visual generation model. It introduces token-based pricing, flexible aspect ratios, and extreme high-fidelity text rendering.
| Feature | DALL-E 3 | GPT Image 2 |
|---|---|---|
| Text in images | Often garbled | Pixel-perfect rendering |
| Aspect ratios | Fixed (1:1, 16:9) | Fully flexible |
| Editing | Inpainting only | Full conversational editing |
| Pricing | Per-image | Token-based (pay for complexity) |
const image = await openai.images.generate({
model: "gpt-image-2",
prompt: "A futuristic Tokyo skyline at sunset, cyberpunk style, 8K detail",
size: "1536x1024",
quality: "high"
});
A specialized variant that combines reasoning with visual generation. It can analyze complex prompts, perform web searches for visual reference, and autonomously refine outputs before returning the final image.
All GPT-5.4 models accept image inputs — upload photos, screenshots, charts, or documents and the model will analyze them.
const response = await openai.responses.create({
model: "gpt-5.4",
input: [{
role: "user",
content: [
{ type: "input_text", text: "What's in this screenshot?" },
{ type: "input_image", image_url: "https://example.com/screenshot.png" }
]
}]
});