Blog

Thoughts on software engineering, architecture, and building things that matter.

Apr 2026·8 min read

the hidden cost of opus 4.7's tokenizer

opus 4.7 shipped with the same $5/$25 rate card as 4.6. but the new tokenizer means the same text creates more tokens — and community testing suggests the official 35% ceiling may be underselling the gap.

LLMsAIEngineering

Mar 2026·10 min read

the model is not the product

claude opus 4.5 scores 42% on core-bench with one scaffold. 78% with another. same model, same weights. the gap between the best and worst scaffolds for any given model frequently exceeds the gap between models.

AI AgentsLLMsEngineering

Feb 2026·10 min read

what happens when an ai thinks twice? looped language models explained

a 1.4b parameter model matching 4b models on reasoning benchmarks. looped language models reuse the same layers multiple times instead of stacking more parameters.

AILanguage ModelsLLMs

Feb 2026·12 min read

why ai agents need observability

your agent failed at step 9 of a 10-step chain. traditional monitoring shows you a 500 error. what it doesn't show you is why the reasoning drifted at step 4.

AI AgentsObservabilityDeveloper Tools

Jan 2026·11 min read

building developer tools for llms

llm tooling has moved past simple api wrappers. function calling, mcp servers, structured outputs, agent frameworks. here's what the landscape actually looks like.

LLMsMCPDeveloper Tools