How I remove the bias LLMs inject

If you ask a language model for a theological insight, you don't simply get a misattributed passage or the wrong tractate name: you get a passage that is entirely invented — plausible structure, authoritative tone, yet wrong in every verifiable detail.

Why it matters (and why it has to be removed)

Theology validates claims of enormous weight: a single comma can change the meaning of the human condition and bear on everyday and long-term decisions.

The difficulty today isn't a shortage of information but an excess of noise — noise the model inherits and amplifies. Between you and a primary source there are, typically:

Translation layers — every translator makes interpretive choices that are no longer visible in the final text
Commentary layers — centuries of scholars explaining what the text "really means"
Popularization layers — books explaining what the scholars say
Automated synthesis — models trained on all of the above, which end up learning the paraphrases of paraphrases, detached from the sources' historical-cultural context

When you query a model about a Talmudic passage, it isn't consulting the Talmud: it's consulting everything ever written about the Talmud, weighted by what's most common online. The result looks authoritative. Often it isn't.

The problem isn't that the AI is stupid — it's that it isn't optimized to tell the difference. A general-purpose model reproduces the most acceptable reading, the one that fuses what a source says with what is commonly said about that source. A plausible-but-wrong passage is the signature of optimization for plausibility — the model converging on the answer that satisfies rather than the one that is true — not a sign of stupidity. For ancient texts, with sparse and specialized training data, that pull is strongest.

What changes when you read the sources in context

An example that changes how you read an entire book of the New Testament.

The word "yoke" in Matthew 11:29 — "Take my yoke upon you and learn from me."

In the Western Christian tradition, from Augustine to Calvin, it is read metaphorically and spiritually: the gentle, easy yoke of grace, light because borne up by charity (Augustine) and set against the heavy yoke of the Law (Calvin).

In Second Temple Judaism — the text's actual historical context — "yoke" (עֹל, ol) was a technical legal term: it referred to a teacher's authoritative interpretation of the Torah's commandments ("the yoke of the Torah", "the yoke of the kingdom of heaven"). The Mishnah states it openly: reciting the Shema means "receiving the yoke of the kingdom of heaven" (Berakhot 2:2). To say "take my yoke" was therefore a precise claim of halakhic authority.

Once you know this, Matthew 11:29 is not a metaphorical image but an explicit claim of authority within the Jewish tradition in which Jesus operated. And that changes how you read the entire Sermon on the Mount, Paul's letters on the Law, and the relationship between Jewish and Christian practice.

Why it works differently

1. A graph of verified primary sources. The system is built around a graph of primary texts in the original languages and selected academic sources, with explicit relationships between them. Every claim traces back to a node in that graph.

The decisive point is how the sources are selected. Correctness doesn't come from the algorithm but from the corpus. And the corpus isn't a popularity-weighted collection: it's an already-filtered academic selection, a perimeter of sources curated by historical-critical criteria (including the research of Prof. Walter Binni). The AI works inside that perimeter: it doesn't decide what is authoritative, it applies a choice of authority made upstream by someone with the competence to make it. Without that selection, any system would go back to averaging the noise.

2. A quality system of 17 methodologies. Every analysis passes through 17 independent hermeneutical methodologies (8 modern academic, 9 historical-genealogical). If the methodologies converge, the result is marked as well-grounded; if they diverge, the claim is flagged as uncertain or contested — instead of flattening everything into a confident-sounding average.

What broke (and what I learned)

Verifying the citation of a biased claim isn't enough to fix the bias: you can have the right source and the wrong framing.

And here's the counterintuitive part: even after cleaning the corpus of bias, the model keeps re-introducing it. It doesn't distort because the sources are distorted, but because of the implicit patterns of its training, regardless of the knowledge base it is given.

The fix was to move the bias scan from a final check to a preparatory stage: before generation, the system probes the retrieved texts, identifies which distortions the model would slide toward on that topic, and injects corrective rules into the prompt. This shrinks the window in which the distortion would be inserted, instead of having to remove it afterward.

There's a corollary. Treating gaps and bias as a correction layer applied at generation time — rather than something to bake into the model's weights — means not having to train a dedicated model: no targeted retraining (fine-tuning), no GPUs to maintain. The domain knowledge lives in the data (curated corpus and distortion patterns), not frozen in the weights. And when a better base model ships, you just swap it: the correction layer still applies, without the endless retraining treadmill.

To make this systematic I built a test battery: over 6,500 trap questions, each designed to check whether a general-purpose model would be drawn toward one of the catalogued theological distortion patterns. Result: without the system, a general-purpose model gets about 23-28% of the targeted probes wrong — not random errors, but always the same kind of distortion, recurring in predictable contexts.

Two figures must be kept apart, because they measure opposite things:

~23-28% — how often the model falls for the trap. This is the model's error rate.
85-97% — how confident we are when we flag a distortion. This is the detector's precision: of 438 catalogued patterns, 157 exceed 85% precision and the most recurrent reach 95-97%. It doesn't say whether we've found all the distortions — that's the limit that grows with every article analyzed — but how reliable each individual flag is.

Above everything sits a hard threshold: nothing is published below 90/100. 75% of articles clear the check on the first pass, with an average of 94.6/100; the rest go back for revision or are rejected. The number isn't arbitrary: it comes from a 19-criterion rubric by which an evaluator agent scores each section across four families — citations, content, theological accuracy, coverage.

By the numbers

17 hermeneutical methodologies (8 academic, 9 historical-genealogical)
19-criterion rubric: every text scored on citations, content, theological accuracy, coverage
438 catalogued distortion patterns · 157 at ≥85% precision · the recurrent ones at 95-97%
Over 6,500 trap questions · ~23-28% error rate for a general-purpose model
Publication threshold: nothing below 90/100 · 75% clears the first check · average 94.6
Real-time citation verification (Sefaria) · a single source of truth → parallel texts and articles

What it looks like in practice

Results live on two levels. In the parallel text each passage is shown in side-by-side columns: the sources in their original languages next to a validated translation. That's where you see where the distortion bites, line by line — the conventional rendering next to the one reconstructed from the historical-cultural context. In the thematic articles the same results are reorganized by topic. The parallel text is the source; the article is the popularization that follows from it.

You can explore it on TeoCentro.

What I'd do differently

Start with the bias scan first. I spent weeks building a system that was confident before building one that was honest about its own limits. In hindsight, the layer that flags the gaps should have been the first thing, not the last.

The tool: precorrect

So I extracted the correction layer — the part that measures where a model drifts and injects corrections before generation, domain- and framework-agnostic — into a small open-source tool. It corrects only where the model is confidently wrong, skipping where it's already right: the corrective interventions are cut by about a third at the same quality. Almost everything debiases after; this does it before. Available on precorrect on GitHub (pip install + a 30-second example).

It lands on a live nerve. Andrej Karpathy speaks openly of an evaluation crisis: his "LLM Council" — several frontier models grading each other's answers — fell right into the trap, reaching a confident consensus he himself disagreed with. His Verifiability thesis sums it up: trust an automated evaluation only where a ground-truth verifier exists. precorrect is built for the cases where that verifier is missing: it lets you build it yourself from your own sources — an answer key — so the model is measured against the truth, not against what's popular and pleasing.

And there's a measured result behind it: swapping only the external truth-standard flipped the outcome. That write-up — the experiment and the numbers — is the next article.

The series on truthful AI: 1. The problem · 2. The tool (this) · 3. The proof (coming).

Clean corpus, biased anyway