Chapter03

Chapter 3: The Epistemic Commons¶

Authorities as Infrastructure¶

The identity server does not invent canonical identifiers. It borrows them from communities that have been building shared identity infrastructure for decades.

MeSH -- Medical Subject Headings -- is maintained by the National Library of Medicine and covers diseases, drugs, biological processes, and anatomical structures. It has been the standard vocabulary for biomedical literature indexing since 1963. Its hierarchical structure encodes relationships among concepts that would otherwise have to be extracted from text.

HGNC -- HUGO Gene Nomenclature Committee -- maintains official symbols and names for human genes. When a paper from 1987 uses a gene name that was superseded in 1995, HGNC records both names and the relationship between them. The identity server can resolve the old name to the current symbol without any domain-specific logic.

RxNorm, maintained by the National Library of Medicine, provides normalized names for clinical drugs. UniProt maintains the authoritative database for protein sequences and functional information. ChEMBL covers bioactive molecules.

NCBI Taxonomy¶

The Linnaean hierarchy -- kingdom, phylum, class, order, family, genus, species -- is the picture most people carry from school biology. When a knowledge graph needs organisms (strains, species, higher taxa) to sit in a stable tree, not just diseases and drugs, a separate class of authority applies. NCBI Taxonomy, maintained by the National Center for Biotechnology Information, is the taxonomy that backs GenBank, RefSeq, BLAST, and the organism lines in UniProt and related resources. In practice it is the shared hierarchy most biomedical pipelines assume when they say "this sequence is from Homo sapiens" or "this clade." It is not the same thing as MeSH: it encodes clinical and literature concepts (including some organism terms for indexing); NCBI Taxonomy encodes taxonomic parent/child relationships for naming and classifying life for sequence and database work. Other curated name lists exist for specialized domains (marine taxa, fungi, viruses under ICTV rules, and so on); a production domain service may consult more than one. This book treats NCBI Taxonomy as the canonical placeholder for "the official online organism tree" in a biomedical stack -- with the understanding that a fuller treatment would spell out API usage, version stability, and when to fall back to embedding-based resolution for organisms without a clean database hit.

These authorities share a common property: they were built to solve the same problem the identity server solves, at the level of a single domain, by a community of experts who needed shared identity to communicate. The identity server aggregates them. It is a client of the epistemic commons, not a replacement for it.

What You Inherit When You Anchor¶

Anchoring an entity to an authority identifier does more than assign a unique key. It connects the entity to the authority's full record for that identifier: its definition, its synonyms, its taxonomic position, its cross-references to related identifiers in adjacent authorities.

A disease entity anchored to MeSH:D003480 (Cushing Syndrome) inherits the MeSH tree's knowledge that Cushing Syndrome is a subtype of Adrenal Cortex Diseases, which is a subtype of Endocrine System Diseases, which is a subtype of Pathological Conditions, Anatomical. It inherits the MeSH-recorded synonyms: "Hypercortisolism", "Adrenal Cortex Hyperfunction". It inherits the cross-references to ICD-10-CM codes.

None of this has to be extracted from the corpus. It is already encoded in the authority. Anchoring is the operation that makes it available to the graph.

Cross-Domain Composition¶

The consequence of anchoring to shared authorities extends beyond a single graph. When two graphs -- one built from research papers, one built from clinical trial records -- both anchor their disease entities to MeSH and their drug entities to RxNorm, a query can traverse from a research finding to a clinical trial outcome. The shared identifiers are the bridges.

This is not a feature of BFS-QL, or of any query protocol. It is a consequence of the decision to anchor to shared authorities. The identity server makes that decision systematic and enforced rather than optional and inconsistent.

The practical implication for graph builders: every entity that could be anchored to an authority should be. Provisional entities that remain unanchored are islands -- they participate in their local graph but cannot bridge to other graphs. The authority lookup stage of the lookup chain is not an optimization. It is the operation that connects the graph to the epistemic commons.