Lesson 3 · Summarising reliably

Summarising reliably — grounding against the fabricated authority

A Renaissance fresco showing Emperor Constantine kneeling to offer a small statuette to the enthroned Pope Sylvester I inside Old St Peter's Basilica, surrounded by courtiers and clergy.
Donation of Rome, workshop of Raphael (Giulio Romano and Gianfrancesco Penni), Sala di Costantino, Vatican, c. 1520–24 — Constantine ceding the Western Empire to Pope Sylvester I. The grant never happened: the document was an 8th-century forgery that underwrote papal temporal authority for seven centuries, until Lorenzo Valla proved it fabricated in 1440 — on internal evidence alone, Latin that did not yet exist when it claimed to be written. A fabricated authority, believed because it was useful and well-formed, undone only by someone who checked it against the record. Source: Wikimedia Commons · public domain.

In the Donation of Constantine, the Emperor hands the Western Empire to Pope Sylvester I — and on the strength of that document the papacy claimed temporal authority over kings for seven centuries. The grant never happened. The deed was forged around 750, but it was useful and it was well-formed, so it was believed, cited, and built upon by people who had every reason not to look too closely. It came undone only in 1440, when Lorenzo Valla did the one thing nobody had bothered to do: he checked it against the record, and found Latin in it that did not yet exist on the date the document claimed for itself. A fabricated authority is not undone by being implausible. It is undone by someone tracing it back to a source.

That is the entire discipline of this lesson, and it is the most important thing the course teaches — because the modern version of the forged deed is not seven centuries old, it is one prompt away. General-purpose models hallucinate fabricated case-law in a majority of answers: Dahl, Magesh, Suzgun and Ho, “Large Legal Fictions,” measured hallucination rates of 58–88% on specific legal queries. The reflex answer — “use a proper legal tool, they retrieve real sources” — helps, but does not cure it: Magesh et al., “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools,” found that even the RAG-backed commercial products still invent material a substantial fraction of the time (Lexis+ AI around 17%, Westlaw’s tool around 33% in that study). Those figures test 2023–24 product versions and the vendors dispute them, so treat them as point-in-time, not a verdict on any tool today — but the direction is unambiguous: no current system, retrieval or not, is hallucination-free, and a confident citation to a case that does not exist is precisely what gets a practitioner sanctioned. Grounding discipline is therefore not a nicety bolted onto the summary; it is the summary’s reliability.

So the cell below is not a model — it is a checker, and it is the first skill you bank. It holds a small set of source passages (the kind of relevant paragraphs Lesson 2’s retrieval would surface) and tests a single claim against them: do the claim’s distinctive terms actually co-occur in a source? The box starts with a plausible, confident, badly-wrong claim — a three-year limitation period and a case, Harrington v Bellwether, that appears in none of the sources — so the squeeze is visible on load: it comes back UNSUPPORTED, flagged as a possible fabricated authority. Now do the thing this course is built on: change the claim and watch the verdict move. Paste in a true statement drawn from the sources (the six-year period, the £18,450 figure, the Caparo holding) and watch it ground; nudge one number or swap the case name and watch it fail. The keyword test is honest about its own limits — real grounding is harder than word-overlap, a faithful paraphrase can share little vocabulary, and two passages can share words yet mean opposite things — but the discipline it enforces is the transferable one, and the one you save as the Grounding check: trace every claim to a source, and flag, never smooth, anything that has none.

✏️ a claim from the summary
the code · starting Python…
# You don't need to change the code — just edit the box above and press Run.
# (It runs on its own the first time, so you can see what it does straight away.)

# `sources` is a pre-loaded set of SOURCE passages — the retrieved material a summary is
# allowed to rely on (the output of Lesson 2). `claim` is the sentence in the box above: one
# statement the summary makes. The discipline of this lesson, in one line: EVERY claim in a
# summary must trace back to one of these sources. Anything that doesn't is not "probably fine" —
# it is the exact red flag that gets practitioners sanctioned, and it must be FLAGGED, not smoothed.

# A claim is built of words; so is each source. The most honest cheap check we can do is ask:
# do the claim's distinctive words actually appear together in a source passage? Strip out the
# common filler ("the", "of", "a"...) so we test the words that carry the legal content —
# the period, the figure, the holding, the case name — not the grammar around them.
STOPWORDS = {"the", "a", "an", "of", "for", "is", "are", "to", "in", "on", "and",
             "or", "that", "this", "with", "by", "be", "as", "it", "no", "not"}

def key_terms(text):
    # lower-case, keep only the content words (length > 2, not a stopword)
    words = "".join(c.lower() if c.isalnum() or c.isspace() else " " for c in text).split()
    return {w for w in words if len(w) > 2 and w not in STOPWORDS}

claim_terms = key_terms(claim)

# Walk every source and measure how many of the claim's key terms it contains. We treat the
# claim as SUPPORTED by a source only if MOST of its key terms (>= 60%) co-occur there — a
# single shared word ("negligence" appearing somewhere) is not support, it is coincidence.
best_source = None
best_overlap = 0.0
for name, passage in sources.items():
    passage_terms = key_terms(passage)
    shared = claim_terms & passage_terms
    overlap = len(shared) / max(len(claim_terms), 1)   # fraction of the claim that is grounded
    if overlap > best_overlap:
        best_overlap = overlap
        best_source = name

print("claim:", claim)
print("checked against", len(sources), "source passages")
print("-" * 60)

if best_overlap >= 0.6:
    # The claim's content is actually present in a source. We can cite that source.
    print("SUPPORTED  (" + str(round(best_overlap * 100)) + "% of the claim's key terms found)")
    print("  -> grounded in source:", best_source)
    print("  -> safe to keep in the summary, cited to that source.")
else:
    # The claim's content is NOT in any source. In a real workflow this is where a confident
    # holding or — worse — a case citation that exists nowhere in the file ("Harrington v
    # Bellwether") gets caught. A claim with no source is a POSSIBLE FABRICATED AUTHORITY.
    print("UNSUPPORTED  (best match only " + str(round(best_overlap * 100)) + "% — below the 60% bar)")
    print("  -> NOT found in any source passage.")
    print("  -> FLAG: possible FABRICATED AUTHORITY. Do NOT cite this. Verify against the")
    print("     primary record before it goes anywhere near a court or a client.")

# Honest about the tool: real grounding is much harder than counting shared words — a paraphrase
# that uses none of the source's vocabulary can still be perfectly faithful, and two passages can
# share words yet mean opposite things. This keyword check will miss some of that. But the
# DISCIPLINE it enforces — trace EVERY claim to a source, and flag anything you can't — is the
# transferable skill, and it is the one you bank as your first saved skill: the Grounding check.
Show the code and expected output
# You don't need to change the code — just edit the box above and press Run.
# (It runs on its own the first time, so you can see what it does straight away.)

# `sources` is a pre-loaded set of SOURCE passages — the retrieved material a summary is
# allowed to rely on (the output of Lesson 2). `claim` is the sentence in the box above: one
# statement the summary makes. The discipline of this lesson, in one line: EVERY claim in a
# summary must trace back to one of these sources. Anything that doesn't is not "probably fine" —
# it is the exact red flag that gets practitioners sanctioned, and it must be FLAGGED, not smoothed.

# A claim is built of words; so is each source. The most honest cheap check we can do is ask:
# do the claim's distinctive words actually appear together in a source passage? Strip out the
# common filler ("the", "of", "a"...) so we test the words that carry the legal content —
# the period, the figure, the holding, the case name — not the grammar around them.
STOPWORDS = {"the", "a", "an", "of", "for", "is", "are", "to", "in", "on", "and",
             "or", "that", "this", "with", "by", "be", "as", "it", "no", "not"}

def key_terms(text):
    # lower-case, keep only the content words (length > 2, not a stopword)
    words = "".join(c.lower() if c.isalnum() or c.isspace() else " " for c in text).split()
    return {w for w in words if len(w) > 2 and w not in STOPWORDS}

claim_terms = key_terms(claim)

# Walk every source and measure how many of the claim's key terms it contains. We treat the
# claim as SUPPORTED by a source only if MOST of its key terms (>= 60%) co-occur there — a
# single shared word ("negligence" appearing somewhere) is not support, it is coincidence.
best_source = None
best_overlap = 0.0
for name, passage in sources.items():
    passage_terms = key_terms(passage)
    shared = claim_terms & passage_terms
    overlap = len(shared) / max(len(claim_terms), 1)   # fraction of the claim that is grounded
    if overlap > best_overlap:
        best_overlap = overlap
        best_source = name

print("claim:", claim)
print("checked against", len(sources), "source passages")
print("-" * 60)

if best_overlap >= 0.6:
    # The claim's content is actually present in a source. We can cite that source.
    print("SUPPORTED  (" + str(round(best_overlap * 100)) + "% of the claim's key terms found)")
    print("  -> grounded in source:", best_source)
    print("  -> safe to keep in the summary, cited to that source.")
else:
    # The claim's content is NOT in any source. In a real workflow this is where a confident
    # holding or — worse — a case citation that exists nowhere in the file ("Harrington v
    # Bellwether") gets caught. A claim with no source is a POSSIBLE FABRICATED AUTHORITY.
    print("UNSUPPORTED  (best match only " + str(round(best_overlap * 100)) + "% — below the 60% bar)")
    print("  -> NOT found in any source passage.")
    print("  -> FLAG: possible FABRICATED AUTHORITY. Do NOT cite this. Verify against the")
    print("     primary record before it goes anywhere near a court or a client.")

# Honest about the tool: real grounding is much harder than counting shared words — a paraphrase
# that uses none of the source's vocabulary can still be perfectly faithful, and two passages can
# share words yet mean opposite things. This keyword check will miss some of that. But the
# DISCIPLINE it enforces — trace EVERY claim to a source, and flag anything you can't — is the
# transferable skill, and it is the one you bank as your first saved skill: the Grounding check.
claim: The limitation period for a negligence claim is three years (Harrington v Bellwether [2019] EWHC 244).
checked against 5 source passages
------------------------------------------------------------
UNSUPPORTED  (best match only 18% — below the 60% bar)
  -> NOT found in any source passage.
  -> FLAG: possible FABRICATED AUTHORITY. Do NOT cite this. Verify against the
     primary record before it goes anywhere near a court or a client.

ThinkThis checker would have flagged 'Harrington v Bellwether' the moment it failed to find it in any source — but in practice the fabricated citation usually arrives wrapped in three true sentences, formatted impeccably, and exactly what you were hoping the law would say. Real lawyers have been sanctioned not because they were careless people but because a plausible, well-formed authority is the easiest thing in the world to wave through. So: in your own workflow, what is the standing rule that forces a source-trace on *every* proposition before it leaves your desk — and is it a habit you actually perform, or one you only believe you would?