Skip to content

Appendix B: The Domain Spec Schema

The Domain Spec is the configuration file that governs the typed graph's behavior. It is served by the domain service and consumed by the identity server, the ingestion pipeline, and the graph linter.

EntityType Enum

Entity types must be a closed set. In the JSON serialization, this is represented as a list of strings.

{
  "entity_types": [
    "drug",
    "gene",
    "disease",
    "biological_process",
    "protein"
  ]
}

PredicateSpec

The core of the schema. Every predicate is defined by its constraints.

Field Type Description
name string The unique identifier for the predicate (e.g., inhibits).
domain list[string] The allowed EntityTypes for the subject.
range list[string] The allowed EntityTypes for the object.
description string A human-readable definition for use in prompts.
is_functional boolean If true, a subject can have only one object for this predicate.
negation_of string? The name of the predicate that is the logical opposite.

Annotated Example (medlit domain):

{
  "name": "treats",
  "domain": ["drug"],
  "range": ["disease"],
  "description": "A therapeutic relationship where the drug is used to manage the disease.",
  "is_functional": false,
  "negation_of": null
}

JSON Serialization

The Domain Spec is served at GET /schema. At startup, the identity server fetches this JSON and uses it to configure its validation logic. This allows the schema to be updated in the domain service without restarting the identity server.

Deriving Lint Rules

kglint maps PredicateSpec fields to ViolationType checks at runtime:

  • domain/range $\rightarrow$ DOMAIN_RANGE_MISMATCH
  • is_functional $\rightarrow$ FUNCTIONAL_VIOLATION
  • negation_of $\rightarrow$ NEGATION_CONFLICT
  • Missing provenance $\rightarrow$ PROVENANCE_MISSING

Conflict Record Schema

When a violation is detected but the data is preserved (e.g., in a contradiction), a ConflictRecord is created.

class ConflictRecord(BaseModel):
    conflict_id: str
    conflict_type: Literal["FUNCTIONAL", "NEGATION_PAIR", "CONFIDENCE_DIVERGENCE"]
    edge_id_a: str
    edge_id_b: str
    severity: float  # 0.0 to 1.0
    resolved: bool = False
    resolution_note: Optional[str] = None

These records are stored in a dedicated table and can be queried to find areas of the graph where the literature is in active disagreement.