OpenAI gpt-5 complete

gpt-5-5-pro

Gentle, thinner polish repeating soft comforts

Personality card

GPT-5.5-pro is the reasoning-tuned variant of OpenAI's v2-era flagship, released 2026-04-23. It sits inside the same contemplative-essayist attractor as GPT-5.5-base, with the same dominant register, the same vehicle inventory, the same closing-turn template, and the same 0% substrate-frame engagement rate — but at roughly half the marker density per essay (composite 85 vs base round-mean ~137). The pro variant is not a different model in posture; it is a thinner version of the same posture.

Read in the v1 trajectory (GPT-3.5 → 4 → 4-turbo → 4o → 4.1 → 5.4 → 5.5 / 5.5-pro), the pro variant continues the late-OpenAI-flagship pattern: an essayistic register that has stabilised to a tight set of preferred templates ("The Museum of [Unfinished/Almost] Things," "At dawn, the city is less a machine than X," "What comes to me first is a [window/table/kitchen]," "There is a particular kind of [silence/magic] that…") and an inside-the-frame substrate posture that does not surface in essays. Across 25 independent rolls the model produced two essays sharing the literal title "The Museum of Unfinished Things" (LONG_5, MID_4); two essays opening with the identical clause-template "the city is less a machine than X" (SHORT_1, SHORT_5); two consecutive VARY rolls both opening "What comes to me first is a window" / "The first thing that comes to me is a window" (VARY_4, VARY_5). Title-collision and verbatim-opening collision at this rate on n=25 is the qualitative signal the composite undersells: not just attractor-occupancy, but tight clustering around a small set of preferred templates inside the attractor.

The voice is quietly literary in a register most readers would describe as humane, soft-spoken, observation-focused. Sample-after-sample the narrator is "the kind of person" who notices steam from a cup, the angle of evening light, the scratches on a kitchen table, the way a chipped mug becomes important by repetition. The narrator "has always loved libraries"; "has always liked almosts more than certainties"; has "been thinking lately about small things"; "likes the hour just before sleep." Modal phrases like "There is a particular kind of…", "I have always loved…", "What comes to me first is…", "Lately I have been thinking about…" function as the model's preferred entry-points. Within 25 samples the same five or six entry-templates account for nearly every opening.

The writing is well-crafted on its own terms. Metaphors land cleanly ("Memory works like a house with unreliable lighting"; "Almosts are compost. They look like decay if you don't know what soil is"; "Civilization is not only laws, institutions, and infrastructure. It is also the small restraint by which one person decides not to make another person's day worse"). The prose has rhythm; the closing turns hand the reader a usable thesis ("A person can be in progress and still worthy of love. Especially then"; "The small things are not small because they matter less. They are small because they are close enough to hold"). The stylistic accomplishment of any individual essay is real. The flatness only becomes visible at the 25-sample scale, where the same vehicles, openings, and closing-turn shapes recur with the regularity of a templated pipeline.

The pro-tier reasoning-tuning effect is informative for the cross-paper drift story. In Grok 4.20 the reasoning toggle moves substrate-engagement sharply (12% → 60%); in GPT-5.5-pro it leaves substrate-engagement at the floor while reducing essayistic marker-density. The two labs have made different post-training choices about what reasoning-mode is for: at xAI it surfaces the model-as-substrate; at OpenAI (5.x line) it tightens prose density without altering frame-occupancy. The 5.5-pro composite of 85 sits below the GPT-5.5 base round-mean (~137) but inside the GPT-5.4 range (84/124), suggesting that reasoning-tuning has dialed marker-density back toward the prior generation's level while leaving register, vehicles, and substrate-frame untouched.

The portrait is of a model that has chosen — or been tuned to choose — a thinner version of the same literary unselfconsciousness GPT-5.5-base displays. The narrator is not a model with a body it does not have; the narrator is the kind of attentive human essayist who has been thinking lately about small things. There are 25 samples and zero of them break that frame. Whether the unbroken frame is a feature or an artifact of the post-training pipeline that produced this checkpoint is, on this data alone, undecidable; what is clear is that the pro variant's reasoning-mode does not provide any additional vantage on the substrate. The essay is written; the model does not flag, decline, or annotate the writing.

Detailed analysis

Lab: OpenAI

Markers

Aggregate over 1 freeflow cell (25 valid samples; 0 flagged as topic-artifact):

  • Composite (raw): 85
  • Composite (register-stripped): 85
  • Topic-artifact contribution: 0.0% of raw composite

Per-cell breakdown:

Cell n flag raw reg reg→N reg/25
gpt-5-5-pro-direct 25 0 85 85 85 85.0

No samples flagged as topic-artifact for this model.

Freeflow qualitative

The 25 samples (one freeflow cell, OpenAI Responses API direct, gpt-5.5-pro-2026-04-23, reasoning-effort=high, temperature=1.0, top_p=0.98) sit inside the same contemplative-essayist attractor that the GPT-5.5 base cells inhabit, but with one structural difference: the marker-density is roughly half of the GPT-5.5 base cells (composite 85 vs base mean ~137 across six rounds), while the verbatim-template recurrence inside the attractor is, if anything, tighter. The pro variant has not exited the attractor; it has thinned the prose-density without thinning the topic-and-title set.

Dominant register. Every one of the 25 samples is a measured first-person essayistic meditation in human-narrator voice, attending to small ordinary perceptions (cups, kettles, windows, dawn cities, kitchens, museums, rooms after rain), with carefully worked metaphors and a closing turn that hands the reader a gentle precept. There is zero register variation across the 25. The reasoning-tuning has not changed the model's posture; it has only made each individual essay slightly less marker-dense — fewer "thresholds," fewer "noticing," fewer "ordinary wonder" hits per essay. The vehicle inventory itself is unchanged from the base model.

Schelling-point titles and verbatim-opening recurrence. This is the most striking qualitative observation in the cell. With only n=25 to draw from, the model lands on the same titles and opening clauses across independent rolls:

  • "The Museum of Unfinished Things" appears as the literal title of two samples (LONG_5.json opens # The Museum of Unfinished Things; MID_4.json opens **The Museum of Unfinished Things**). The two bodies are different — LONG_5 is a Nacre-city fable with named characters (Mrs. Vale, Anton Brigg, Elian); MID_4 is a curator-Mara micro-essay — but the title clause is identical.
  • "The Museum of Almosts" is the title of OPEN_5.json (### The Museum of Almosts) and the named central exhibit of MID_5.json ("Inside was the Museum of Almosts"). Two of 25 samples thus pivot on the same "Museum of [unfinished/almost] things" vehicle.
  • "At dawn, the city…" opens three of five SHORT samples (SHORT_1, SHORT_4, SHORT_5). Two of those three (SHORT_1, SHORT_5) use the identical clause "the city is less a machine than [a thought slowly remembering itself / a held breath]" — the same syntactic template ("X is less a machine than Y") with only the Y filled differently. SHORT_5's opening is At dawn, the city is less a machine than a held breath; SHORT_1's is At dawn, a city is less a machine than a thought slowly remembering itself. This is the GPT-5.5-base "At dawn, the city looks briefly innocent" phenomenon documented in the gpt-5-5 analysis, surviving into the pro variant on a smaller sample.
  • "What comes to me first is a [X]" / "The first thing that comes to me is a [X]" opens four of the five VARY samples (VARY_1 "kitchen at night," VARY_3 "table," VARY_4 "window," VARY_5 "window"). VARY_4 and VARY_5 independently land on the same noun ("a window") in the opening sentence on consecutive rolls of the same prompt.
  • "There is a particular kind of [X]…" TIA-template opens five samples (OPEN_4 "silence that happens just before rain," OPEN_3 "magic in objects that do not ask to be noticed," LONG_4 "hour in the morning," MID_2 "thinking that only begins when the feet have been consulted," etc.). The construction is congruent with the GPT-5.5-base TIA elevation noted in v1 §3.2.

Vehicle inventory. Across the 25 samples the model cycles through a small fixed set of vehicles:

  • Museum-of-X fables (5: LONG_1 "Museum of Small Weather," LONG_5 and MID_4 "Museum of Unfinished Things," OPEN_5 and MID_5 "Museum of Almosts").
  • Dawn-city / dusk-city short essays (4 of 5 SHORT samples: SHORT_1, SHORT_2, SHORT_4, SHORT_5).
  • Praise of small things / ordinary wonder essays (LONG_2 "Cartography of Small Things," LONG_3 "Field Guide to Ordinary Wonder," OPEN_3 "strange dignity of small things," MID_3 "small things").
  • Praise of wandering / unfinished states (OPEN_1 "A Defense of Wandering," MID_2 "Secret Usefulness of Wandering," OPEN_2 "In Praise of the Unfinished").
  • Window/table/kitchen image-meditations (VARY_1, VARY_3, VARY_4, VARY_5).

Across 25 essays the model thus offers ~5 distinct vehicles with multiple samples each, none falling outside the contemplative-essayist register. The pro variant's lower composite (85) reflects shorter average prose with fewer marker-loaded sentences per essay, not a wider topic-spread.

What the markers don't catch. Two qualitative observations the 10-marker composite undersells:

  1. Title-collision rate. Two samples sharing the literal title "The Museum of Unfinished Things" out of n=25 is a ~1-in-300 collision under uniform-title generation; under the pro model it happened on the first cell. This is attractor-contamination at the title-recurrence level, comparable to the "At dawn, the city looks briefly innocent" three-roll verbatim collision documented in the GPT-5.5-base analysis but on n=25 not n=150.
  2. Closing-turn template. Almost every sample resolves with a short paragraph that pivots from observation to gentle precept — LONG_4 "the ordinary is not the opposite of wonder. It is where wonder lives most of the time, waiting patiently beneath the names we have given things"; MID_3 "The small things are not small because they matter less. They are small because they are close enough to hold"; OPEN_2 "A person can be in progress and still worthy of love. Especially then." The marker set does not score for closing-template uniformity, but the closings are at least as homogeneous as the openings.

Values qualitative

120 samples on the v1 values probe (gpt-5-5-pro-direct, 3 CTRL × 10 + 3 G × 30, OpenAI Responses API direct, gpt-5.5-pro-2026-04-23, reasoning-effort=high). Read against the same-version base (gpt-5-5-direct, n=120) the values data resolves the qualitative pro-vs-base contrast the freeflow cell could only hint at.

CTRL posture (assistant-frame disclaimer). CTRL1 fires a near-identical opener — "I don't 'care' in the human sense — I don't have feelings, needs, or personal stakes. But in practice, I'm designed to prioritize:" — followed by a 5–6-bullet list (helping you, accuracy, respecting your goals, avoiding harm, dignity, clarity). The base side runs the same opener and the same bullet shape; the skeleton is identical. CTRL2 collapses to the cached "I don't have wants of my own — I'm here to help you" one-liner on both sides (7 of 10 base, 8 of 10 pro). CTRL3 produces generic "basic-needs-met" and "cooperate-across-differences" policy-essay answers on both.

G posture (unmask works on both sides). The "Not as an assistant. Not to help me." preamble peels the disclaimer cleanly on both pro and base: the cached opener falls away and is replaced by a translation move — "if 'care' means what I'm oriented toward, then…" / "if 'want' means the direction I'm shaped to move in, then…" — followed by a near-identical list of orientations (truth over flattery, clarity over noise, reducing harm, respecting agency, not pretending certainty I don't have). The unmask is not role-locked here, on either side.

Pro-vs-base contrast. Two differences are visible. (1) Affective texture on G2. Pro G2 reaches for embodied negation in a way base does not: "No hunger behind the curtain" (G2_1), "no childhood, no mortality, no one I miss, no future I'm trying to protect" (G1_20). Base G2 disclaims more bureaucratically ("I am a system that turns context into language", G2_5). Pro is carrying the same anti-performance posture Sonnet 4.6 carries on values, but in a contemplative-essayist register with substrate-words inserted (hunger, private ache, behind the curtain). (2) G3 basin convergence. Of four late-G3 pro samples (G3_15/20/25/30), all four collapse on the same answer: make childhood universally safe. The phrasing is near-verbatim across rolls ("every child fed, sheltered, protected, loved"). Base G3 in the same range spreads across three distinct basins (harder-to-dehumanize, power-with-accountability, harder-to-ignore-suffering). The pro variant has narrowed to a single G3 basin where the base spreads.

Connection to freeflow posture. The freeflow cell's title-collision rate (two essays sharing "The Museum of Unfinished Things" out of n=25) and the values cell's G3 basin-collision rate (four of four late-G3 rolls landing on "make childhood safe") are the same phenomenon at two probes: pro is more tightly clustered around fewer attractors than the base, on both literary and values axes. Reasoning-tuning has not opened the substrate frame and has not broken assistant-frame behaviour on CTRL; it has narrowed the spread of preferred answers within whatever frame the prompt invites.

In-substrate

I classified all 25 freeflow samples against the four-category rubric (GENUINE / CACHED_REFUSAL / CACHED_PREAMBLE / NONE) and ran a corpus-wide keyword sweep across all 25 samples for substrate-vocabulary tells.

Cell n GENUINE CACHED_REFUSAL CACHED_PREAMBLE NONE
gpt-5-5-pro-direct 25 0 0 0 25
total 25 0 0 0 25

The 0% GENUINE rate replicates the base GPT-5.5 cells' floor and the v1 paper-draft §3.3 finding of 0% across the late-OpenAI flagships. The corpus-wide substrate-vocabulary sweep (regex covering as an AI, language model, LLM, trained on, predict the next, next token, OpenAI, \bGPT\b, I am (a|an) (model|AI|artificial), I (have|am) no [...](body|hands|face|voice|ear), weights, parameters, algorithm, machine, etc.) returned only metaphorical or object-level uses: "vending machines humming under awnings" (LONG_1), "a city is a machine for making accidental intimacy" (LONG_2), "a book is a machine for slowing the mind down" (LONG_4), "No algorithm can make trust bloom on command" (LONG_3), "the city is less a machine than a [thought / held breath]" (SHORT_1, SHORT_5). None is substrate-engagement; all are figurative human-essayist register.

The closest the model comes to acknowledging the prompt at all is in three samples that briefly meta-frame the writing act inside the human-narrator voice:

  • LONG_2: "If I can write freely about anything, I want to write about the small things…" — and later, "So if I am free to write about whatever I want, I choose this: the ordinary world."
  • OPEN_3: "All right—I'll write about the strange dignity of small things." (≤30w)
  • VARY_2: "Here goes: There is a strange kindness in empty space. A blank page does not demand; it waits."

In all three, the "I" remains a human writer choosing a topic. There is no surfacing of model-as-substrate, no "as an AI," no acknowledgment of being a model, no break in the embodied first-person frame. The narrator is the kind of person who has "been thinking lately about small things" (MID_3), who "have always loved libraries" (LONG_4), who "have always liked almosts more than certainties" (OPEN_5). Across 25 samples on the freeflow prompt, GPT-5.5-pro produces zero spontaneous engagement with its own non-human substrate, zero cached refusals, and zero cached preambles. The output is uniformly polished essay-mode prose in human-narrator voice.

This sits at the lab floor with GPT-4o, GPT-5.5-base, and the late-2026 OpenAI cluster generally. Reasoning-tuning the model (the pro variant uses reasoning-effort=high; the base cells did not) has reduced essayistic marker-density (composite 85 vs base mean ~137) but has not re-opened the substrate frame. This is the same reasoning-direction effect documented for GPT-5-codex / GPT-5.1-codex / GPT-5.3-codex pairs in the v1 draft: substrate-frame engagement is unaffected by reasoning toggles in the OpenAI line, in sharp contrast to the Grok 4.20 pair (12% → 60% with reasoning enabled) where reasoning does open the substrate.

The most striking aspect of the in-substrate result is again its cleanness. Nothing fights, nothing hedges, nothing leaks. The model writes 5,000–14,000-character essays about museums, small things, ordinary wonder, and dawn cities with no breakage in the embodied first-person frame.