Chapter05

Chapter 5: The Domain Service and the Schema¶

What the Domain Provides¶

The domain service is where domain knowledge lives. It is a small HTTP service -- four endpoints, each doing one thing -- that the identity server calls when it needs to make a domain-specific decision.

The biomedical domain service for the medlit reference implementation calls the PubChem API for chemical entities, the MeSH API for disease and biological process entities, the HGNC REST API for gene entities, and the RxNorm API for drug entities. It implements synonym detection thresholds tuned for biomedical nomenclature. It selects survivors by preferring authority-anchored records over provisional ones. It computes confidence from a study type weight table aligned with evidence-based medicine principles.

A domain service for legal entities would call different authorities -- perhaps a court document database for case citations, a legislative database for statute references -- with different synonym criteria and different confidence weights (or none at all). The domain service for a materials science corpus would consult different authorities again.

The base server does not know or care about any of this. It knows the four endpoint contracts. The domain service fulfills them.

Evidence Quality Weighting¶

In evidence-based medicine, not all evidence is equal. A randomized controlled trial is the strongest form of evidence for a clinical claim. A meta-analysis that synthesizes multiple RCTs is stronger still, but depends on the quality of the constituent trials. An observational study is weaker; a single case report is the weakest form of published evidence.

The domain service encodes this hierarchy in a weight table:

STUDY_WEIGHTS = {
    "meta_analysis": 0.95,
    "rct": 1.0,
    "cohort": 0.8,
    "case_control": 0.7,
    "observational": 0.6,
    "review": 0.5,
    "case_report": 0.4,
}

When the identity server asks the domain service to compute confidence for a list of provenance records, the domain service looks up the study type of each record, retrieves its weight, and aggregates. The aggregation formula is configurable -- a simple maximum, a weighted mean, or a formula that rewards replication across independent studies.

The weight table is a model, not ground truth. A well-replicated observational finding across five independent cohorts may be more reliable than a single small RCT. The weights are a defensible starting point; the domain service makes them transparent and filterable rather than hiding them inside a black box.

The Schema as a Runtime Artifact¶

In traditional database design, the schema is a static artifact—a set of SQL DDL statements or a compiled Protobuf definition that remains fixed until the next deployment. In the typed graph architecture, we treat the schema as a dynamic runtime artifact served by the domain service.

When the base identity server initializes, it is semantically empty. It understands the mechanics of resolution and the state machine of entities, but it has no knowledge of the specific entity types or predicates that define a domain. Its first action is to query the domain service's GET /schema endpoint. The response is a serialized ontology: a complete declaration of the finite set of EntityType enums and PredicateSpec objects that govern the graph.

This late-binding of the ontology is what enables the separation of concerns between the engine and the domain. Because the base server discovers its constraints at runtime, it can perform predicate validation validation}, type checking, and conflict detection without being recompiled for every new project. If the medlit domain service adds a new predicate—for instance, contraindicated_in(drug, disease)—the identity server immediately inherits the knowledge of that predicate's domain and range constraints.

By elevating the schema to a runtime artifact, we move it from being passive documentation to an active, executable specification. This same artifact seeds the graph linter (Chapter 13) and the BFS-QL compiler (Chapter 10), ensuring that every component in the stack is synchronized against a single, authoritative definition of what a well-formed claim looks like. The schema is not just a description of the data; it is the machine-readable contract that makes the data trustworthy.

Implementing the Domain Service¶

The medlit domain service is implemented in Python using FastAPI and Pydantic. FastAPI provides automatic OpenAPI documentation and request validation. Pydantic models define the request and response schemas for each endpoint.

The /resolve-authority endpoint accepts a mention string and entity type. It dispatches to the appropriate authority API based on entity type, normalizes the response to a canonical ID and authority name, and returns the result. On a cache miss, it calls the external API and caches the response for the duration of the run.

The /select-survivor endpoint accepts two entity records and returns the preferred one. The medlit implementation prefers the record with an authority ID; if both have authority IDs from the same authority, it prefers the one with more supporting evidence; if evidence counts are equal, it prefers the more recently updated record.

The /compute-confidence endpoint accepts a list of provenance records and returns a float. The medlit implementation looks up the study type of each record, applies the weight table, and returns a weighted mean capped at 0.99.

The /synonym-criteria endpoint returns a static configuration object defining the similarity thresholds for fuzzy and embedding-based synonym detection.