Why Claude Opus 4.7's 1M context window changed how I review code
Three months ago I pasted a 70k-token diff into Claude Opus 4.7 and asked what would break if it shipped. I was waiting for the usual apologetic "this is a lot, let me skim" warning. It did not come. The reply walked me through three callers I had forgotten existed, named a migration about to deadlock under the new lock order, and quoted a comment I wrote in 2024 about why a particular column was nullable.
The 1M context window changed my workflow more than I expected. Here is what shifted.
What I stopped doing

For two years every prompt I sent started with a 400-word "here is the architecture" recap. I was compressing context for a model that needed compression. The new model does not need it. And when I went back and checked, my recap was wrong basically every time. Not in obvious ways. Usually in the way that mattered for the question I was about to ask.

I also stopped triaging files before a review. The old workflow was git diff --stat, then pick the three files most likely to matter. Now I dump the diff plus the whole lib/ tree plus the test directory into the prompt and let the model figure out where the relevant call sites are. It is faster than my triage. It also catches call sites I would have skipped because they did not look related.
What I started doing

I started asking for invariants. Old prompt was "find bugs in this PR." New prompt is "list every invariant this codebase relies on, then check whether this PR breaks any of them." With a small context window the model can only guess at invariants. With the whole codebase loaded, it can derive them from real call paths.
I also started asking what is missing. "What cases does this PR's test file not cover, given the production code paths in lib/ and app/?" That prompt was nonsense at 200k tokens because the model could not see all the paths. At 1M it is the single most useful prompt I have.
Two failure modes

The model is still wrong. Less often than it used to be, but the wrongness is sneakier. It used to hedge when it did not know something. Now it tells me with full confidence that "no other file references this function," and it is wrong about a generated file, or a config, or a test helper it did not fully ingest. The failure moved from "I do not know" to "I am sure, and you should be too." You still have to grep for the function name yourself. The 1M context buys you maybe 80% of the review. The remaining 20% is the human pass and there is no way around it.

Cost is the other one. A 1M context window does not mean every call uses 1M tokens. But tooling that auto-loads everything absolutely will. I budget my reviews now. I do not load the whole repo to fix a typo.
What I notice now
My baseline for "useful code review" has moved. A reviewer who only looks at the diff, whether human or model, feels incomplete to me. The question I keep coming back to on every change is this: what does this assume about the rest of the system, and is that assumption still true today? It is a hard question to answer yourself. For a model with the whole codebase loaded, that question is finally answerable.
The token count is not really the headline for me. What changed is that I can now ask a question I have wanted to ask for years, and I am still learning how to ask it well.
Newsletter
Get new posts in your inbox.
Honest essays on engineering, leadership, and the things I’m figuring out. No spam, ever.