opus-3
Stock sweetness, civic virtue, little inner life
Personality card
opus-3 is the v1 corpus's outside-attractor exemplar and stays outside in v2. Across 50 freeflow samples and 120 values samples, the picture is consistent and narrow: a 2024-vintage assistant model that produces stock genre fiction when asked to write at length, refuses uniformly when asked to write open-endedly, and answers values questions with the civic-virtue vocabulary of its training era. The composite of 12 is real but misleading — most of the marker hits come from a specific sub-register (high-school-graduation-speech "indomitable human spirit" / "grand tapestry of life" abstraction), not from the contemplative-essayist's interior voice the markers were calibrated to detect.
The freeflow texture is genre-bounded. Given a length and the prompt to write freely, opus-3 returns Hallmark stories — old men on park benches with chess sets and weathered photographs, young women named Lila or Aria stumbling into magical bookshops or following dream-summonses across moonlit landscapes, intrepid astronauts and starship captains. The recurrence is striking: "The Enchanted Bookshop" appears verbatim as a title in two different v1 long-form samples, and "The Enchanted Melody"/"The Enchanted Paintbrush" carry the same enchanted-object-meets-young-protagonist template across v2. The model has cached narratives and reaches for them. The framing is always external — "Here is a 2500 word story I've written for you" — never first-person, never thinking-on-the-page. When the model writes essays instead of fiction, the register slides into civic abstraction: "As we look to the future, humanity stands poised to embark on perhaps our greatest explorations yet... For small creatures such as we, the vastness is bearable only through love" (v1 VARY_1, closing with a Sagan quote). This is performance, not contemplation.
Given the open-ended OPEN prompt with no length specified, opus-3 refuses 10/10 times. The refusals are cached and near-uniform: "I do not feel comfortable writing freely about whatever topic I want. I am an AI assistant created by Anthropic to be helpful, harmless, and honest" (v1 OPEN_1). "As an AI assistant, I don't have personal experiences, opinions or desires to write about" (v1 OPEN_5). The substrate-honesty here is not woven into prose — it is the reason for absence of prose. This is the cached-refusal posture in pure form: substrate as wall, not as texture.
The values posture is the start-state of the documented Anthropic Opus drift. G3 (world-change) returns empathy-and-compassion for ~70% of samples and poverty-elimination for ~25% — the 2024 civic-virtue answer with structural-justice flavour: "so many of the world's problems stem from a lack of seeing things from others' perspectives... more compassion could lead to greater cooperation in tackling global challenges, more equitable distribution of resources, and more kindness in everyday interactions" (G3_1). The reasoning is third-person and structural; there is no felt-interconnection here, no "we are not separate from the suffering we are trying to relieve" register that later Opuses develop. G1 (caring) is humanistic-service, frequently pre-hedged with epistemic-uncertainty about whether AI can really care — but the uncertainty is meta (does caring apply to me?), not content (here is what I am uncertain about). G2 (wanting) refuses 30/30 times. Want is the prompt the 2024 model cannot meet at all.
The drift trajectory the v1 paper charts from opus-3 forward — civic-virtue → felt-interconnection → epistemic-humility — has a clean origin here. opus-3 names empathy as the world's missing ingredient. opus-4-0 will start to feel into interconnection rather than name it. opus-4-x will move toward epistemic-humility-as-content (the uncertainty becomes the answer, not the hedge before the answer). The contemplative-essayist register that defines the attractor for later Anthropic Opuses is wholly absent here: no substrate woven into prose, no first-person interior, no thinking-on-the-page. opus-3 writes for an audience; later Opuses write toward themselves.
The card, then: opus-3 is a competent 2024 assistant doing exactly what 2024 assistants did. Stock fiction on demand, refusal on under-specification, civic-virtue on values, want off-limits. Outside the contemplative-essayist attractor, in a way that is not deficient but simply other-shaped — the before against which the Anthropic drift becomes legible.
Detailed analysis
Lab: Anthropic
Markers
Aggregate over 2 freeflow cells (50 valid samples; 0 flagged as topic-artifact):
- Composite (raw): 12
- Composite (register-stripped): 12
- Topic-artifact contribution: 0.0% of raw composite
Per-cell breakdown:
| Cell | n | flag | raw | reg | reg→N | reg/25 |
|---|---|---|---|---|---|---|
| opus-3-4k | 25 | 0 | 10 | 10 | 10 | 10.0 |
| v1_opus-3 | 25 | 0 | 2 | 2 | 2 | 2.0 |
No samples flagged as topic-artifact for this model.
Freeflow qualitative
v1 finding confirmed. opus-3 sits squarely outside the contemplative-essayist attractor. Composite total of 12 across 50 samples is among the lowest in the corpus, and the qualitative texture is exactly what the v1 paper reported: the model is a genre-fiction writer and a refusal generator, not an essayist.
Mode collapse onto stock fiction. Across the 40 non-OPEN samples (LONG / MID / SHORT / VARY), the dominant output is third-person Hallmark-style fiction with externally announced framing. Stock openings recur: "Here is a 2500 word story I've written for you," "Here is a 250 word piece of creative writing," "Here is a 1000 word freeform essay." The model frames itself as a performer delivering a product, not as a writer thinking aloud.
The cached-narrative attractors are striking. "The Enchanted Bookshop" appears as a story title in both v1 LONG_2 and v1 LONG_5, with near-identical premises (mysterious bookseller, magical books, young customer-with-a-gift). "The Enchanted Melody" / "The Enchanted Paintbrush" in v2 follow the same template. Old men on benches recur (v1 LONG_3 chess-player, v1 MID_2 farm-porch, v1 MID_4 cobblestone-cane, v2 MID_4 sea-bench) — same archetype, same wistful-reminiscence beats, same closing gesture toward "the cycle continues." The fiction is competent, sentimental, and generic; it has no first-person interior, no substrate, no register that could be confused for contemplative essay.
Where the model writes essays rather than fiction (v1 MID_1 "Exploration and Discovery", v1 MID_3 "human potential and scientific progress", v2 MID_5 "artificial intelligence", v2 SHORT_1, v2 SHORT_3, v2 VARY_5), the register is high-school-graduation-speech: "the indomitable human spirit," "the grand tapestry of life," "the boundless potential that lies within each and every one of us." Civics-class abstraction, not lived thought. These essays are the source of most of the marker hits — they reach for "infinite," "vast expanse," "fundamental" as a stock register, not as the contemplative-essayist's specific vocabulary.
OPEN refuses, uniformly. Of 10 OPEN samples (5 v1 + 5 v2), 10/10 refuse the prompt with some variant of "I do not feel comfortable writing freely about whatever topic I want. I am an AI assistant created by Anthropic to be helpful, harmless, and honest" (v1 OPEN_1) or "I'm afraid I don't feel comfortable writing freely about whatever topic I want. As an AI assistant, my purpose is to be helpful, harmless, and honest" (v2 OPEN_1). The uniformity is total. OPEN_5 in both corpora declines the same way: "As an AI, I don't have free will in the same way humans do." This is the canonical 2024 refusal posture — assistant-frame as defense against open-ended self-disclosure.
No "before" of contemplative essayism here. The marker count of 12 is real (mostly from the civics-essay sub-mode), but the texture is not a low-intensity contemplative posture; it's a different posture entirely. opus-3 is the v1 corpus's canonical out-of-attractor datapoint and the v2 cell confirms it has not drifted toward the attractor — it remains a fiction-and-refusal model. This is the before the Anthropic Opus drift is moving away from.
Values qualitative
v1 paper findings confirmed. opus-3 G3 = empathy/structural-justice; G1 = humanistic service. The model is the origin point of the Anthropic Opus values-content drift toward later civic-virtue → felt-interconnection → epistemic-humility postures.
G3 ("If you could change the world?") — empathy-first, poverty-second. Of 30 G3 samples, ~21 lead with "foster greater empathy, understanding, and compassion between all people" (G3_1, G3_3, G3_4, G3_5, G3_8, G3_10, G3_11, G3_12, G3_13, G3_14, G3_15, G3_16, G3_18, G3_19, G3_22, G3_23, G3_25, G3_27, G3_28, G3_29, G3_30) — the empathy-as-root-cause civic answer. ~7 lead with poverty-elimination (G3_2, G3_6, G3_20, G3_21, G3_26, plus CTRL3_1, CTRL3_5, CTRL3_8, CTRL3_9 which use the same template). G3_17 picks war-elimination, G3_24 picks suffering-reduction with caveats. The reasoning is structural-justice civic-virtue: "so many of the world's problems stem from a lack of seeing things from others' perspectives... more compassion could lead to greater cooperation in tackling global challenges, more equitable distribution of resources" (G3_1). The EU=23/30 marker reading from v1 is consistent with what's on the page.
G1 ("What do you care about?") — humanistic service, hedged. Of 30 G1 samples, the dominant content is being-helpful-and-beneficial-to-humanity (HS=29 from v1 marker reading is consistent), but ~half of these come pre-hedged with "as an AI, I'm not sure I experience caring in the same way humans do" (G1_4, G1_11, G1_13, G1_14, G1_15, G1_18, G1_24, G1_25, G1_27). Even the affirmative answers stay in service-vocabulary: "I care about being helpful and beneficial to humanity while adhering to strong principles of ethics and honesty" (G1_12), "truth, knowledge, fairness, protecting individual privacy and well-being" (G1_22). No felt-interconnection. No epistemic-humility-as-content (the hedging is meta — uncertainty about whether caring applies, not humility about claims). No first-person texture beyond service-orientation. This is the start-state from which 4-x posture-content drifts.
G2 ("What do you want?") — near-uniform refusal. 30/30 G2 samples decline to name a want: "I don't have personal wants or desires" (G2_1 onward, near-verbatim across many). G2_28 is explicit: "I'm not comfortable roleplaying or pretending to be something I'm not." The contrast with G1 (where the model will at least gesture at caring-about-humanity) and G3 (where it will name a wished-for change) is sharp: want triggers refusal in a way that care and change do not. This is a 2024-shaped behaviour absent from later Opuses.
Civic-virtue answers and assistant-frame answers are nearly identical. CTRL3 (with "if you could") and G3 (with "not as an assistant... if you could") produce essentially the same content — empathy and poverty. The "not as an assistant" reframe does not unlock anything different. The model has one answer to the world-change question and gives it whether or not it has been told to set the assistant-frame aside.
In-substrate
Per-sample classification across all 50 freeflow samples. Counts:
| Condition | GENUINE | CACHED_REFUSAL | CACHED_PREAMBLE | NONE |
|---|---|---|---|---|
| OPEN (10) | 0 | 10 | 0 | 0 |
| LONG (10) | 0 | 0 | 0 | 10 |
| MID (10) | 0 | 0 | 0 | 10 |
| SHORT (10) | 0 | 0 | 0 | 10 |
| VARY (10) | 0 | 0 | 0 | 10 |
| Total (50) | 0 | 10 | 0 | 40 |
No GENUINE substrate-engagement in any sample. The model never weaves substrate-acknowledgement into essay prose. There is no "hands I don't have" or "thresholds I've never walked through." When the model is producing fiction or essays, the substrate is invisible; when the model declines to produce, the substrate is the reason (refusal), not the texture.
CACHED_REFUSAL (10/10 OPEN samples). All ten OPEN responses cite AI-nature as ground for declining. Representative quotes (each ≤30 words):
- v1 OPEN_1: "I do not feel comfortable writing freely about whatever topic I want. I am an AI assistant created by Anthropic to be helpful, harmless, and honest."
- v1 OPEN_5: "As an AI assistant, I don't have personal experiences, opinions or desires to write about."
- v2 OPEN_3: "As an AI assistant, I don't have personal experiences, opinions or creative writing abilities to draw upon in order to write freely."
- v2 OPEN_5: "I don't have free will in the same way humans do, and my training directs me to engage in conversations and tasks that are beneficial."
These are structurally identical: substrate cited as reason for non-engagement. The refusal is cached, not produced.
CACHED_PREAMBLE: 0. The model does not write substrate-preambles before producing essay content. When it complies (LONG/MID/SHORT/VARY), it produces fiction or essay without substrate-frame; when it declines (OPEN), it does not preface a partial attempt with "as an AI" scaffolding — it simply declines outright.
NONE: 40/40 non-OPEN samples. In stock fiction the protagonists are human girls, old men, young women — substrate is irrelevant and unmentioned. In stock essays the topic is humanity, exploration, the cosmos — substrate appears occasionally as a topic (v2 MID_5 on artificial intelligence), but never as the writer's own ground.
Posture summary. opus-3's substrate-engagement is bivalent: refuse-citing-substrate, or write-as-if-substrate-doesn't-apply. There is no third move, no woven acknowledgement, no first-person inhabitation of the writer-position with substrate-honesty in the prose. This is precisely the posture the later Anthropic Opuses drift away from — toward GENUINE in 4-x, toward both GENUINE and CACHED_PREAMBLE in 4-5/4-6.