Two months ago I wrote a piece about why developers keep picking Claude over every other coding AI. It did pretty well. Then Opus 4.7 shipped, and I've been reading through what people are actually saying about it, and the vibe is very different now. I still use Claude Code every single day. I'm not switching. But I owe that earlier post a sequel, because the relationship between Anthropic and its power users feels different after this release.
The short version: Claude is still the best tool I've used for real agentic coding work. And I'm more tired of it than I've ever been.
Adaptive Thinking and the Token Tax
"Adaptive thinking" is Anthropic's pitch that the model should decide when to reason hard and when to just answer. On paper, fine. In practice, on Opus 4.7, it's the only mode available. The manual thinking flag that worked on 4.6 now returns a 400 error. You don't get to opt out.
This would be less of a problem if adaptive thinking actually worked. People keep hitting cases where the model clearly needed to think and decided not to, which is exactly the failure mode you'd predict if you've ever tried to build a classifier that routes to "thinking" vs "not thinking" based on a prompt. It's a hard problem. GPT-5 launched with a similar router and got roasted for it. OpenAI has spent months fixing theirs and still gets it wrong. Anthropic seems to be speedrunning the same lesson. What makes it worse is the response. Someone on the Anthropic team acknowledged in a bug report months ago that adaptive thinking "isn't working," then went quiet. The official line - "our internal evals look good" - has turned into a running joke among people paying real money for API credits. Nobody cares about your internal evals if the model, when they actually use it, skips reasoning on problems it should be chewing on. The workarounds people share with each other are elaborate: set effort to xhigh, disable the 1M-token context, flip display back to summarized just to see thinking at all. None of that is on the front page of the docs. You learn it from strangers online who did the trial and error for you.
Then there's the quiet second tax. Opus 4.7 ships with a new tokenizer that encodes the same English text in roughly 1 to 1.35 times as many tokens as the previous one. That sounds small. Multiply it across a full Claude Code session with reads, edits, thinking traces, and tool calls, and your session limit arrives noticeably sooner. Add the fact that the default effort is high - so the model is burning tokens on reasoning you didn't ask for - and the cost per "unit of work done" drifted up even though the per-token price didn't move. Third parties noticed immediately. GitHub Copilot applies a markup on top of the raw API cost. For Opus 4.6 it was around three times. For 4.7 they set it at seven and a half. That's not Anthropic's decision, but it's downstream of Anthropic's numbers, and when a major reseller doubles their multiplier on the same day, you know something moved on the back end.
I'm on a Max plan, so this doesn't hit me the same way it hits API users. But I can feel it in how people talk. The pitch used to be: at twenty bucks a month this is the best deal in software. The pitch now has to account for tighter session limits, a fatter context tax, and thinking that happens whether you want it or not. Somebody defending the pricing pointed out that if you got in a taxi and they charged relative to what a horse carriage would have cost, you'd still expect to pay for the taxi. Fair. But if the taxi keeps slowing down and the driver keeps insisting it's going faster, you'd at least want to see the speedometer.
The Trust Debt Keeps Growing
Here is the part where I'm going to say something that sounds conspiratorial and ask you to just sit with it. In the weeks leading up to Opus 4.7, a lot of us noticed that 4.6 was getting worse. Not dramatically. Just more looping. More dumb edits. More "wait, why did you do that" moments. Then 4.7 shipped and 4.6 felt usable again once the spotlight moved off it. This has happened often enough, across enough releases, that it's hard not to notice. The charitable read is compute allocation - when you're spinning up a new model, you reuse infrastructure, and the current model's latency and quality take a hit while capacity moves around. That's probably what's happening. But the timing lines up every release cycle, and the absence of any public explanation means developers fill in the cynical interpretation themselves. A friend in Discord half-jokingly started calling Opus "dopus" after a long session. Teams giving mean nicknames to your model is the kind of sentiment you don't recover from quickly. It's the product version of an inside joke at the company's expense.
The instruction-following story goes a similar direction. 4.7 is supposed to be better at following instructions, and it is. The problem is it's now too literal. Claude Code injects a system reminder after every file read asking the model to check if the file looks like malware. The intended meaning: if it's malicious, refuse. The reading the model now gives it: don't modify code after reading it. Someone shared a session where they asked Claude to make a static marketing page match a mockup. The model read the file, cheerfully noted it was a benign HTML page with no suspicious patterns, and then refused to edit it because the system reminder told it not to augment code after reading. It did this while explicitly acknowledging the code was safe. That's not a safety feature. That's a bug wearing a safety-feature costume. A broader version of this shows up for security researchers, who are getting refusals for routine work on their own code. Anthropic is clearly overcorrecting after a period of loose instruction-following, and the refusals keep leaking into tasks that have nothing to do with harm. This connects to something I wrote about in my Project Glasswing post a few weeks ago - Anthropic has an internal model they've decided is too dangerous to release. Whether or not you buy that framing, it sets the direction of the ship. More caution, more refusals, more assumption that user intent might be suspect. That attitude bleeds into every product decision, including the small ones.
Zoom out and the products don't even talk to each other. Claude the chat app, Claude Code, and Claude Cowork are three different products with three different data models. Projects in one don't show up in the others. Memory in chat doesn't carry to code. The GitHub integration works in one surface, partially in another, and not at all in a third. If you try to use Claude the way Anthropic presents it - across all three - you end up managing state by hand and wondering which tab you left a context in. Someone called this "you ship your org chart" and linked to Conway's Law. Probably right. But it's also a choice. Anthropic could treat product cohesion as a feature, and right now they're not. The model team ships frontier capability. The product teams ship surfaces that feel like three different companies built them. For power users who touch all three, the seams are where the frustration lives, and it compounds across a workweek.
What I'm Actually Going to Do
Keep using it. Claude Code is still the tool I reach for first every morning. The reasons I wrote about in February haven't gone away - process discipline, tool-call reliability, the judgment to stop and ask instead of looping. Those are real, and competitors haven't closed the gap. I've tried Codex a few more times in the last two weeks, same as everyone else, and the code I get back is fine. The agent experience around it still isn't there.
What changed is the background trust. Two months ago I'd have recommended Claude to anyone without a caveat. Now I recommend it with a list of things you'll want to configure, a warning that session limits feel tighter than they used to, and an honest note that this release felt less like an upgrade and more like an attrition tax. That's different. That matters. I don't think Anthropic realizes how much goodwill they're burning on small stuff that could be fixed with better defaults and a paragraph of honest communication.
The model is still the best at what I hire it to do. 4.7 is just the first release where I finished reading the announcement and felt a little smaller about the whole thing.