Issue 01Workflow

What 10M tokens unlock for personal injury.

Read the entire case file in one thought. The workflow patterns that become possible when the AI’s context window finally exceeds the matter’s record set.

By Eve-Legal·May 19, 2026·7 min read

The single most consequential under-discussed shift in AI for legal practice over the past eighteen months is the arrival of long-context reasoning at scale. Meta Llama 4 Scout, in production today, accepts a single context window of 10 million tokens. That is roughly the length of fifteen thousand pages of typed prose, or thirty-five thousand pages of medical records in their usual formatting density. In one inference call. With no chunking, no retrieval-augmented hand-off, no risk of losing the cross-reference between page seven and page nine thousand.

The legal-tech industry has not internalized what this means. The shorthand for AI in legal practice has been "the AI reads documents and summarises them" — a framing that made sense when the AI’s context was a few thousand pages and the workflow depended on chunking and retrieval-augmented generation to keep the answers grounded. Long context retires the entire framing.

What a personal-injury case file actually contains

A representative serious PI matter accumulates documents at the following rough cadence. Initial intake records and accident reports: 50 to 200 pages. Insurance correspondence across the carrier’s claim file: 200 to 800 pages. Treatment records across the providers seen during conservative care, surgical intervention, and rehabilitation: 2,000 to 20,000 pages depending on the matter’s severity. Independent medical examinations and life-care planning reports: 100 to 500 pages. Deposition transcripts and exhibits, when the matter goes into suit: another 1,000 to 5,000 pages. Expert reports and rebuttal expert reports: 200 to 1,500 pages. Discovery production from the defendant: 1,000 to 50,000 pages, on the high end of which sits commercial-vehicle and mass-tort matters where the production volume dwarfs the medical record set.

A serious PI matter routinely accumulates 5,000 to 50,000 pages by the time the matter is ready for the demand letter. Before long context, the AI’s relationship to this record set was always indirect. The pipeline chunked the records, embedded the chunks into a vector store, retrieved the chunks the prompt most resembled, and the model reasoned over a small slice of the matter at a time. The model could not actually read the case file. It could only sample it.

What changes when the model reads the whole file

Long context turns sampling into reading. The differences this makes are not incremental; they are categorical.

Cross-document contradictions become trivial. The IME report says the plaintiff had full range of motion at week eight. The treating physiatrist’s note from the same week says the plaintiff had thirty-percent flexion deficit. The chunked-retrieval pipeline sees one document or the other depending on which the prompt resembled more strongly. The long-context pipeline sees both, and the contradiction is the centrepiece of the response.

Treatment-gap detection stops being a heuristic. The chunked-retrieval pipeline finds gaps by sampling appointment lists and flagging unusual intervals. The long-context pipeline reads every appointment, sees the full sequence, and identifies the gap that matters: the one where the carrier’s IME will argue subsequent treatment is no longer causally connected to the underlying loss.

Causation arguments anchor to specific sequences. The strongest causation arguments in PI litigation are not "the medical literature supports X" arguments; they are "the treating provider sequence shows X" arguments. The long-context pipeline can quote the exact paragraph from the exact note where the treating provider links the current symptomatology to the underlying mechanism of injury. Chunked retrieval gets close to this — long context gets it exactly right.

Settlement-matrix modeling becomes one-shot. The chunked-retrieval pipeline needs multiple queries to assemble a settlement matrix across a plaintiff’s documented specials, future-medical projections, lien total, and policy limit. Long context produces the matrix in a single inference. The implication for Mass Tort, where the matrix must run at scale across thousands of plaintiffs, is the difference between a workflow that takes weeks and a workflow that takes a few hours.

What does not change

Long context does not change the citation discipline. The four-pass demand-letter pipeline’s citation verifier still resolves every cited authority against CourtListener at generation time. Long context does not change the reasoning posture: the legal reasoner is still calibrated on Eve-Genesis (Law Edition); the frontier slots still cross-validate when the matter justifies their cost; the supervisor still decomposes the workload into stage-specialized sub-tasks. Long context is not the architecture. It is one slot in the architecture — the slot that makes everything else more powerful by removing the sampling constraint that bounded it.

The shape of the change

The honest framing of long-context reasoning for PI practice is this: the workflow patterns that took weeks because the AI could not see the whole case file are now one-shot. The patterns that required a paralegal to assemble retrieval scaffolding are now end-to-end. The patterns that were not previously possible at all — Mass Tort settlement-matrix modeling at scale — are now product surfaces. The PI edition shipped with this capability live. The MT edition will land with it as the structural moat.