Chapter11

Chapter 11: Making Bad Ideas Inexpressible¶

Hilbert's Dream¶

At the turn of the 20th century, David Hilbert proposed a grand program for mathematics: to find a formal system in which every true statement could be proved and, crucially, no false or meaningless statement could even be constructed. He wanted a system where bad mathematics was inexpressible, not merely discouraged. Kurt Gödel famously proved that this was impossible for mathematics as a whole.

However, for a domain-constrained knowledge graph, we can actually achieve a version of Hilbert's dream. We are not trying to represent all of human thought; we are trying to represent a finite set of biomedical or legal claims. By using a typed schema, we can build a boundary that structurally refuses to hold certain classes of "bad ideas."

What Becomes Inexpressible¶

The typed schema operates at four distinct layers to make unwarranted or malformed assertions inexpressible:

The Type Layer: Edges where the subject or object violates the predicate's domain or range are structurally rejected. You cannot state that "Aspirin (DRUG) inhibits New York (LOCATION)" if the inhibits predicate only accepts GENE or PROCESS as its object. Predicates outside the finite vocabulary are simply not available for use.

The Identity Layer: Assertions involving unresolvable or "provisional" entities that fail to meet minimum evidence thresholds are sequestered. A claim about a thing that cannot be named and anchored to an authority is an incomplete idea; the system can choose to refuse these until identity is established.

The Provenance Layer: The schema can require that every edge carries a pointer to its warrant. An assertion without a source, or a claim whose extraction method is undeclared, is not a fact in the system's eyes—it is a malformed record. The "undifferentiated provenance bag" common in untyped systems is structurally replaced by mandatory, typed provenance records.

The Consistency Layer: Contradictory assertions—such as a functional predicate having two different values, or both a predicate and its negation_of pair coexisting—are not permitted to "just sit there." They must either be resolved or wrapped in a conflict record that acknowledges the dispute. The "bad idea" of unacknowledged contradiction becomes inexpressible.

The Functional Programming Analogy¶

This approach mirrors the "make illegal states unrepresentable" mantra of functional programming in languages like Haskell or Rust. In those languages, you don't write runtime checks for null values if your type system can guarantee a value is always present. You move the invariant into the type system.

A typed knowledge graph applies this principle to assertions. We move the "rules of evidence" into the graph's structure. If the schema compiles and the linter passes, we know that certain classes of errors—type mismatches, missing sources, unanchored identities—simply cannot exist in the data.

What This Requires of the Ontology¶

This level of enforcement is only as good as the ontology it enforces. To make bad ideas inexpressible, the ontology must be rich: - Functional predicates must be declared (e.g., has_official_symbol). - Negation pairs must be identified (e.g., activates vs. inhibits). - Provenance completeness rules must be defined (e.g., "every extraction must have a confidence score").

The Domain Spec becomes the constitution of the graph. If it is weak, the graph is noisy. If it is rigorous, the graph becomes a high-fidelity instrument for reasoning.

The Limits: Gödel's Revenge¶

We must be honest about the boundary: the typed graph enforces structural well-formedness, not semantic correctness. A well-typed, well-sourced edge can still be factually wrong. An LLM might extract "Aspirin treats Cancer" and correctly link it to RxNorm and MeSH. The linter will pass it because the types match and the provenance is present.

This is not a defect; it is a feature. It separates the structural integrity of the knowledge base from the truth value of the claims it contains. We can guarantee the former; for the latter, we provide the provenance so a human (or a more sophisticated agent) can decide.