OpenAI GPT-5.5: Smartest Model for Coding and Work

Darius Z. By Darius Z. 7 min read
Abstract futuristic neural network visualization with glowing data streams representing the OpenAI GPT-5.5 model

Key Takeaways

  • GPT-5.5 scored 82.7% on Terminal-Bench 2.0, leading Claude Opus 4.7 by 13.3 points on agentic coding tasks
  • The model hit 78.7% on OSWorld-Verified for autonomous computer use and 84.9% on GDPval across 44 knowledge-work occupations
  • API pricing is $5 per million input tokens and $30 per million output tokens, matching Claude Opus 4.7 on input cost
  • Claude Opus 4.7 still leads on SWE-Bench Pro (64.3% vs 58.6%) for multi-file software engineering

OpenAI released GPT-5.5 on April 23, 2026, calling it “a new class of intelligence for real work.” The model is the first fully retrained base architecture since GPT-4.5 (internal codename “Spud”) and targets four areas: agentic coding, computer use, knowledge work, and scientific research. On Terminal-Bench 2.0, GPT-5.5 scored 82.7%, the highest of any publicly available model. On OSWorld-Verified, it reached 78.7% for autonomous computer control. API pricing sits at $5 per million input tokens and $30 per million output tokens. The model is rolling out to ChatGPT Plus, Pro, Business, and Enterprise users, with API access expanding in phases.

Try GPT-5.5 in ChatGPT

GPT-5.5 is available now for ChatGPT Plus, Pro, Business, and Enterprise users.

Try ChatGPT GPT-5.5 →

What Can GPT-5.5 Do?

GPT-5.5 is built for tasks that require sustained, multi-step execution without constant human oversight. OpenAI President Greg Brockman described it as a model that “can look at an unclear problem and figure out just what needs to happen next.” The biggest gains are in four categories.

Agentic Coding

Writes production code, debugs issues, refactors legacy projects, and navigates multi-file codebases. Scored 82.7% on Terminal-Bench 2.0.

Computer Use

Operates real desktop environments autonomously: clicks, types, navigates apps. 78.7% on OSWorld-Verified.

Knowledge Work

Analyzes documents, creates spreadsheets, researches across sources. 84.9% win-or-tie rate on GDPval across 44 occupations.

Scientific Research

Leads on FrontierMath Tier 4 (hardest math problems) and sets records on GeneBench and BixBench for scientific reasoning.

What separates GPT-5.5 from GPT-5.4 is how it handles ambiguity. The model asks for less human guidance, uses tools more effectively, checks its own output, and keeps going until a task is finished. OpenAI reports that on Expert-SWE (their internal coding benchmark for tasks with a median 20-hour human completion time), GPT-5.5 outperforms GPT-5.4 while using fewer tokens.

How Does GPT-5.5 Perform on Benchmarks?

GPT-5.5 leads on agentic, multimodal, and math benchmarks. Claude Opus 4.7 holds the lead on coding and knowledge tasks. Gemini 3.1 Pro competes closely on reasoning. All scores below are self-reported by each provider.

Benchmark scores are self-reported by each provider and may use different evaluation conditions

Benchmark GPT-5.5 Claude Opus 4.7 Gemini 3.1 Pro Measures
Terminal-Bench 2.0 82.7% 69.4% 68.5% Agentic shell workflows
SWE-Bench Pro 58.6% 64.3% 54.2% Multi-file GitHub issues
OSWorld-Verified 78.7% 78.0% ~60% Autonomous computer use
GDPval (Win/Tie) 84.9% 80.3% 67.3% Knowledge work, 44 occupations
GPQA Diamond 93.6% 94.2% 94.3% Graduate-level science Q&A
FrontierMath Tier 4 Leading Hardest math problems
Tau2-bench Telecom 98.0% ~90% ~85% Customer service workflows

The biggest swing is Terminal-Bench 2.0, where GPT-5.5 leads Claude Opus 4.7 by 13.3 percentage points. This benchmark tests unattended shell-driven tasks requiring planning, error recovery, and self-verification. The sharpest counter: Claude Opus 4.7 leads SWE-Bench Pro by 5.7 points, measuring real-world GitHub pull request resolution.

GPT-5.5 vs Claude Opus 4.7: Who Wins?

Neither model dominates across the board. They target different workloads, and the right choice depends on what you need.

Claude Opus 4.7 (released April 16, one week before GPT-5.5) wins on coding. Its 64.3% on SWE-Bench Pro means it resolves more real-world multi-file GitHub issues end-to-end. It also leads on CursorBench (70% vs ~65%), making it the stronger pick for IDE-integrated development. On graduate-level reasoning without tools (HLE no-tools), Opus 4.7 leads 46.9% to 41.4%.

GPT-5.5 wins on agentic tasks. Its Terminal-Bench 2.0 lead (+13.3 points) reflects stronger performance on long-running command-line workflows that need planning, iteration, and tool coordination. On computer use (OSWorld-Verified), it edges Opus 4.7 by less than a point (78.7% vs 78.0%). On Tau2-bench Telecom for customer service automation, GPT-5.5 hits 98.0%.

The pricing is nearly identical: both charge $5 per million input tokens, while GPT-5.5 costs $30 and Opus 4.7 costs $25 per million output tokens.

How Much Does GPT-5.5 Cost?

GPT-5.5 API pricing aligns with frontier model rates. Input tokens cost the same as Claude Opus 4.7 and Gemini 3.1 Pro. Output tokens carry a premium.

$5/M Input tokens
$30/M Output tokens
1M Context window

OpenAI is rolling GPT-5.5 out to ChatGPT Plus ($20/month), Pro ($200/month), Business, and Enterprise tiers. API access is expanding gradually. A GPT-5.5 Pro variant exists that uses parallel test-time compute for harder problems. Gemini 3.1 Pro remains the budget option at $1.25 input / $10 output per million tokens with a 2M token context window.

What Safety Measures Does GPT-5.5 Include?

Cybersecurity Capability Rating: High

OpenAI’s Preparedness Framework rates GPT-5.5 as “High” in cybersecurity, an increase from GPT-5.4. Additional safeguards restrict scaled agentic vulnerability research and exploit-chaining for users outside the Trusted Access for Cyber program.

OpenAI tested GPT-5.5 against its full Preparedness Framework before release, with nearly 200 early-access partners providing feedback. The model carries three safety ratings: High for biological and chemical capabilities (same as GPT-5.4), High for cybersecurity (increased from GPT-5.4), and below High for AI self-improvement.

The system card notes that GPT-5.5 cannot develop “functional zero-day exploits of all severity levels in many hardened real-world critical systems without human intervention,” which is the Critical threshold. OpenAI has expanded its Trusted Access for Cyber (TAC) program to give verified security professionals broader access to dual-use cyber capabilities while restricting them for general users.

What This Means for Creative Professionals

GPT-5.5 is not a creative tool. But many creative tools run on OpenAI’s API, and those products now have access to a model that handles multi-step workflows better and costs less per token than GPT-5.4.

The computer use capability matters most here. At 78.7% on OSWorld-Verified, GPT-5.5 can navigate real desktop applications on its own. Think AI agents that run your video editor, adjust export settings, or switch between creative apps without you touching the mouse.

For developers building creative AI products, the $5/$30 per million token pricing and 1M context window lower the cost of longer automated workflows. OpenAI says GPT-5.5 uses fewer tokens than GPT-5.4 on equivalent tasks, which compounds the savings.

Claude Opus 4.7 launched one week earlier with stronger coding benchmarks. GPT-5.5 counters with stronger agentic performance. Developers building AI creative tools now have two frontier models, each optimized for a different part of the pipeline.

Try GPT-5.5 in ChatGPT

Available now for Plus, Pro, Business, and Enterprise users. API access expanding in phases.

Try ChatGPT GPT-5.5 →

FAQ

What is GPT-5.5?

GPT-5.5 is OpenAI's latest flagship AI model, released on April 23, 2026. It is the first fully retrained base model since GPT-4.5 and targets agentic coding, computer use, knowledge work, and scientific research. The model scores 82.7% on Terminal-Bench 2.0 and 78.7% on OSWorld-Verified for autonomous computer control.

How much does GPT-5.5 cost?

GPT-5.5 API pricing is $5 per million input tokens and $30 per million output tokens. It has a 1 million token context window. ChatGPT users can access GPT-5.5 through Plus ($20/month), Pro ($200/month), Business, and Enterprise plans. Gemini 3.1 Pro is the cheaper alternative at $1.25/$10 per million tokens.

Is GPT-5.5 better than Claude Opus 4.7?

It depends on the workload. GPT-5.5 leads on agentic tasks like Terminal-Bench 2.0 (82.7% vs 69.4%), computer use (78.7% vs 78.0%), and knowledge work (84.9% vs 80.3%). Claude Opus 4.7 leads on coding benchmarks including SWE-Bench Pro (64.3% vs 58.6%) and CursorBench (70% vs ~65%). Both charge $5 per million input tokens.

When was GPT-5.5 released?

OpenAI released GPT-5.5 on April 23, 2026. It launched for ChatGPT Plus, Pro, Business, and Enterprise users on the same day. API access is being rolled out in phases. Claude Opus 4.7 launched one week earlier on April 16, 2026.

Does GPT-5.5 support image generation?

GPT-5.5 itself is primarily an intelligence model focused on coding, research, and computer use. OpenAI separately launched ChatGPT Images 2.0 on April 21, 2026, which uses the gpt-image-2 model for high-quality image generation and editing within ChatGPT. Both features are available to ChatGPT Plus and Pro subscribers.

Sources

  1. OpenAI - Introducing GPT-5.5
  2. OpenAI - GPT-5.5 System Card
  3. TechCrunch - OpenAI releases GPT-5.5
  4. CNET - ChatGPT 5.5 Is All About Math, Science and AI Research
  5. BenchLM - Claude Opus 4.7 vs GPT-5.5 Benchmark Comparison
  6. Appwrite - GPT-5.5 is here: benchmarks, pricing, and what changes for developers

Was this article helpful?

0:00