Lesson 1 · Getting documents in

Getting documents in — when the bundle overflows the window

A 1619 engraving of a Renaissance theatre stage with five arched doorways and an upper gallery, used as a "memory theatre" for storing facts at fixed positions. — A memory theatre from Robert Fludd's *Ars Memoriae*, 1619. The classical art of memory — taught to advocates by Cicero and Quintilian — fixed each fact at a position in an imagined building and recalled it by walking the rooms in order, so that *where* a thing sat governed how surely it was held. Read with modern eyes, the orator's stage anticipates a failure we now name in language models: across a finite context window, the ends are used well and the middle slips. Source: Wikimedia Commons · public domain.

A context window is finite. In Lesson 0 you watched a single sentence turn into a handful of tokens; a real case bundle is thousands of those, and the model can only hold so many at once. So the practical question stops being “what does this sentence cost?” and becomes “how much of my bundle actually fits?” The cell below answers that for a pre-loaded bundle of fourteen short legal documents — a witness statement, an indemnity clause, a statute subsection, a paragraph of a judgment, and so on — by tokenising each one and packing them in until the next document would overflow the window you set.

Now do the thing this course is built on: change the window box and watch the budget move. The box starts deliberately small — 600 tokens — so the squeeze is visible on load: only part of the bundle fits, and the cell names the first document turned away at the door. Raise the window (try 2,000) to fit the whole bundle, or drop it (300) to cut more, and press Run. That is context budgeting. A genuine 40-document bundle behaves the same way, only sooner and harder: it does not all fit, and something always gets left outside the window. The only choice is whether you decide what, or let an arbitrary cut-off decide for you.

But fitting is not the whole story — which is the bridge into the rest of the course. Even the documents that do land inside the window are not read evenly. The anchoring work here is Liu et al., “Lost in the Middle: How Language Models Use Long Contexts”: across a long input, models use material at the very beginning and the very end far more reliably than material buried in the middle, where performance can drop sharply. Position matters, not just inclusion. (The exact shape of that curve is debated on the newest long-context models, and there is as yet no legal-specific study — so treat it as a strongly-evidenced tendency to design around, not an iron law.) Lesson 2 takes up the obvious response: if you cannot fit everything, and the middle is read least, then retrieving the few passages that matter beats stuffing the whole bundle in.

✏️ context window (tokens)

the code · starting Python…

# You don't need to change the code — just edit the box above and press Run.
# (It runs on its own the first time, so you can see what it does straight away.)

# `bundle` is a pre-loaded case bundle: a list of 14 short legal documents
# (a witness statement, a contract clause, a statute subsection, and so on).
# `window` is the number in the box above: how many tokens the model can hold at once.

# First, find the "cost" of every document by tokenising it — the same trick as Lesson 0,
# but now applied to a whole document each, not a single sentence:
costs = [len(enc.encode(doc)) for doc in bundle]

# The bundle's total size, in tokens. This is what the WHOLE bundle would cost the model
# if it could swallow it in one go:
total = sum(costs)
print("whole bundle:", total, "tokens across", len(bundle), "documents")

# Now do the budgeting. Walk the bundle in order, keeping a running total, and stop the
# moment the next document would tip us over `window`. Everything up to that point fits;
# the document that breaks the budget — and everything after it — is left outside the window.
running = 0          # tokens spent so far
fit = 0              # how many documents we've managed to fit
first_left_out = None  # the index of the first document that did NOT fit
for i, cost in enumerate(costs):
    if running + cost > window:   # adding this one would overflow the budget
        first_left_out = i
        break                     # stop — no room for this document or any after it
    running += cost               # it fits: spend the tokens and move on
    fit += 1

# Report the budget. How many of the 14 documents made it inside `window`:
print("fits in window:", fit, "of", len(bundle), "documents  (" + str(running), "tokens used)")

# ...and which document was the first to be turned away at the door:
if first_left_out is None:
    print("everything fit — try a SMALLER window to see the bundle overflow")
else:
    print("first document left out: #" + str(first_left_out), "—", bundle[first_left_out][:48] + "...")

# Change the `window` box and press Run again: watch how many documents fit move up and down.
# A real 40-document bundle behaves exactly like this — it does not all fit, and you are the
# one deciding (or accidentally letting the cut-off decide) what gets left out.

# One caveat the rest of the course is built on: FITTING is only half the story. Even the
# documents that DO land inside the window are not read evenly — material in the middle is
# used less reliably than material at the very start or end. They call it "lost in the middle".
print("note: even what fits isn't read evenly — the middle gets used least ('lost in the middle')")

Show the code and expected output

# You don't need to change the code — just edit the box above and press Run.
# (It runs on its own the first time, so you can see what it does straight away.)

# `bundle` is a pre-loaded case bundle: a list of 14 short legal documents
# (a witness statement, a contract clause, a statute subsection, and so on).
# `window` is the number in the box above: how many tokens the model can hold at once.

# First, find the "cost" of every document by tokenising it — the same trick as Lesson 0,
# but now applied to a whole document each, not a single sentence:
costs = [len(enc.encode(doc)) for doc in bundle]

# The bundle's total size, in tokens. This is what the WHOLE bundle would cost the model
# if it could swallow it in one go:
total = sum(costs)
print("whole bundle:", total, "tokens across", len(bundle), "documents")

# Now do the budgeting. Walk the bundle in order, keeping a running total, and stop the
# moment the next document would tip us over `window`. Everything up to that point fits;
# the document that breaks the budget — and everything after it — is left outside the window.
running = 0          # tokens spent so far
fit = 0              # how many documents we've managed to fit
first_left_out = None  # the index of the first document that did NOT fit
for i, cost in enumerate(costs):
    if running + cost > window:   # adding this one would overflow the budget
        first_left_out = i
        break                     # stop — no room for this document or any after it
    running += cost               # it fits: spend the tokens and move on
    fit += 1

# Report the budget. How many of the 14 documents made it inside `window`:
print("fits in window:", fit, "of", len(bundle), "documents  (" + str(running), "tokens used)")

# ...and which document was the first to be turned away at the door:
if first_left_out is None:
    print("everything fit — try a SMALLER window to see the bundle overflow")
else:
    print("first document left out: #" + str(first_left_out), "—", bundle[first_left_out][:48] + "...")

# Change the `window` box and press Run again: watch how many documents fit move up and down.
# A real 40-document bundle behaves exactly like this — it does not all fit, and you are the
# one deciding (or accidentally letting the cut-off decide) what gets left out.

# One caveat the rest of the course is built on: FITTING is only half the story. Even the
# documents that DO land inside the window are not read evenly — material in the middle is
# used less reliably than material at the very start or end. They call it "lost in the middle".
print("note: even what fits isn't read evenly — the middle gets used least ('lost in the middle')")

whole bundle: 890 tokens across 14 documents
fits in window: 9 of 14 documents  (560 tokens used)
first document left out: #9 — By reason of the matters aforesaid the Claimant ...
note: even what fits isn't read evenly — the middle gets used least ('lost in the middle')

ThinkIf a context window forces you to leave some documents outside it, then the order in which you load a bundle silently decides what the model never sees — and 'lost in the middle' means even the documents you do load are not weighed equally. In a real matter, who in your firm is implicitly making that ordering decision when a bundle is fed to an AI tool, on what principle, and would it survive the scrutiny you would apply to a junior who simply stopped reading the file two-thirds of the way through?