Apr 2026·8 min read

the hidden cost of opus 4.7's tokenizer

LLMsAIEngineering

opus 4.7 kept the same rate card as 4.6: $5 input, $25 output per million tokens. the new tokenizer changes what that rate actually costs you. anthropic says up to 35% more tokens on average. whitespace-heavy content runs considerably higher than that.

the space token

the clearest way to see the tokenizer difference is to isolate it. running token counts on both models through the api's usage response directly, with an empty system prompt and a fixed user turn, the only variable is what the assistant responds with.

  # baseline — shortest possible exchange

  messages = [{"role": "user", "content": "hi"}, {"role": "assistant", "content": "hi"}]

  opus47: 15, opus46: 9  ← 1.67x

  # concatenated — no whitespace between words

  messages = [{"role": "user", "content": "hi"}, {"role": "assistant", "content": "hihi"}]

  opus47: 16, opus46: 10  ← 1.6x

  # one space added between the two words

  messages = [{"role": "user", "content": "hi"}, {"role": "assistant", "content": "hi hi"}]

  opus47: 17, opus46: 10  ← 1.7x — space is its own token in 4.7

  # 50 space-separated repetitions

  messages = [{"role": "user", "content": "hi"}, {"role": "assistant", "content": "hi" + " hi" * 50}]

  opus47: 115, opus46: 59  ← 1.95x

"hihi" and "hi hi" differ by exactly one character: a space. in opus 4.6 they tokenize identically, 10 tokens each. in opus 4.7, "hi hi" costs one more token than "hihi". the space between words is tokenized separately.

at 50 repetitions that compounds. "hi" + " hi" * 50 runs to 115 tokens on 4.7 versus 59 on 4.6. the content itself nearly doubled in token cost, on input that is almost entirely spaces and a single repeated word. the 1.35x ceiling anthropic quotes is for dense prose. content with meaningful whitespace structure sits above it, sometimes well above it.

what this means for real workloads

anthropic's migration guide puts the increase at "roughly 1x to 1.35x." that range is accurate for most natural language content. code with indentation, yaml config, json with whitespace formatting, structured logs, conversation histories with delimiter-heavy system prompts: these patterns sit above the ceiling. the rate card says the same $5 per million input. the effective cost per task does not.

  # pricing, april 2026

  claude opus 4.7

  input:  $5.00 / mtok

  output: $25.00 / mtok

  tokenizer: up to 1.35x officially — higher on whitespace-heavy content

  claude sonnet 4.6

  input:  $3.00 / mtok  ← 40% cheaper

  output: $15.00 / mtok  ← 40% cheaper

  tokenizer: no change from 4.6

  claude haiku 4.5

  input:  $1.00 / mtok

  output: $5.00 / mtok

  tokenizer: no change

a 1.95x tokenizer multiplier on whitespace-heavy content translates directly to a 1.95x cost increase on that traffic. the dollar-per-million figure on the pricing page does not reflect that. the only way to know your real effective cost is to run your actual requests through both models and compare the token counts before committing to a migration.

the benchmarks

the model is meaningfully better. swe-bench verified went from 80.8% to 87.6%, ahead of gemini 3.1 pro at 80.6%. swe-bench pro, the harder multi-language variant, went from 53.4% to 64.3%. on rakuten's internal benchmark, opus 4.7 resolved 3x more production tasks than 4.6, with double-digit gains in code quality and test quality. notion and hebbia both reported similar patterns in agent accuracy.

the vision improvements are substantial too. processing resolution went from 1.25mp to 3.75mp, and visual acuity on the relevant benchmarks jumped from 54.5% to 98.5%. for computer-use agents and document extraction pipelines, that is a real capability change, not a marginal one.

but benchmark gains on coding and vision tasks do not offset tokenizer costs on classification and rag workloads. those are separate questions about separate parts of your system.

xhigh effort and task budgets

opus 4.7 ships with a new effort level called xhigh. anthropic's docs describe it as the recommended starting point for most coding and agentic use cases, positioned above high but below max. at xhigh, the model works harder on complex tasks and uses more tokens to do it. at max effort, internal evals show the model consuming up to 200k tokens on demanding coding tasks; xhigh lands closer to 100k.

task budgets are a new mechanism in public beta: an advisory token ceiling across the full agentic loop. the model sees the budget and paces itself accordingly. the minimum is 20k tokens. it is advisory, unlike the hard cap in max_tokens. at scale, the combination of a higher-token tokenizer, an xhigh effort default, and agentic loops without explicit budget caps adds up quickly. setting task budgets on production agent workloads is worth doing before you hit a billing surprise.

when sonnet 4.6 is the better choice

sonnet 4.6 is $3 input and $15 output per million, 40% cheaper on the rate card, with no tokenizer change. for classification, rag retrieval, summarization, and most content generation, sonnet 4.6 is close enough in capability that the cost difference is hard to justify. opus 4.7 earns the premium on autonomous coding agents, complex multi-step tool workflows, and vision-heavy pipelines where the benchmark gaps are real and your workload actually exercises them.

model routing across your request types, rather than a wholesale migration to 4.7, is how you capture the capability gains without absorbing the tokenizer cost on traffic that does not need it. anthropic's migration guide makes this point directly: measure on real traffic, re-benchmark end-to-end cost and latency, and update max_tokens parameters to give headroom for higher token counts before switching.

  sources: anthropic opus 4.7 release notes (april 2026) · anthropic migration guide · anthropic api pricing docs · rakuten swe-bench evaluation · vellum ai benchmark analysis · venturbeat opus 4.7 coverage

the model is not the productMar 2026 what happens when an ai thinks twice? looped language models explainedFeb 2026