Skip to content

Appendix A: Identity Server Specification

Abstract Interface

from abc import ABC, abstractmethod
from typing import Optional, FrozenSet
from enum import Enum
from pydantic import BaseModel, Field

class EntityStatus(str, Enum):
    PROVISIONAL = "provisional"
    CANONICAL = "canonical"
    MERGED = "merged"

class EntityRecord(BaseModel, frozen=True):
    """An entity in the identity server."""
    entity_id: str = Field(description="Stable identifier for this entity")
    entity_type: str = Field(description="Entity type from the domain spec")
    surface_forms: FrozenSet[str] = Field(
        description="All known mention strings for this entity"
    )
    status: EntityStatus = Field(description="Current lifecycle status")
    authority: Optional[str] = Field(
        description="Name of the anchoring authority, if canonical"
    )
    authority_id: Optional[str] = Field(
        description="Canonical ID from the authority, if canonical"
    )
    confidence: float = Field(description="Aggregate confidence score")
    evidence_count: int = Field(
        description="Number of supporting provenance records"
    )

class ResolveResult(BaseModel, frozen=True):
    """Result of a resolve operation."""
    entity_id: str = Field(description="Canonical or provisional entity ID")
    status: EntityStatus = Field(description="Status of the returned entity")
    was_created: bool = Field(
        description="True if a new provisional entity was created"
    )

class MergeResult(BaseModel, frozen=True):
    """Result of a merge operation."""
    survivor_id: str = Field(description="Entity ID of the surviving record")
    absorbed_id: str = Field(description="Entity ID of the absorbed record")
    was_already_merged: bool = Field(
        description="True if this merge had already been performed"
    )

class IdentityServer(ABC):
    """
    Abstract base class for the identity server.

    All operations must be idempotent: safe to call multiple times
    with the same arguments and guaranteed to produce the same result.
    """

    @abstractmethod
    def resolve(
        self,
        mention: str,
        entity_type: str,
    ) -> ResolveResult:
        """
        Resolve a mention string to a canonical or provisional entity ID.

        Applies the lookup chain: exact match, fuzzy match, embedding
        similarity, authority lookup. Creates a provisional entity if
        no match is found.

        Args:
            mention: The surface form to resolve.
            entity_type: The type of entity (e.g., "drug", "gene", "disease").

        Returns:
            ResolveResult with the entity ID and status.
        """

    @abstractmethod
    def promote(
        self,
        entity_id: str,
    ) -> Optional[EntityRecord]:
        """
        Attempt to promote a provisional entity to canonical status.

        Calls the domain service to look up the entity's most common
        surface form against the appropriate authority. Returns the
        updated EntityRecord if promotion succeeded, None otherwise.

        Args:
            entity_id: The provisional entity ID to promote.

        Returns:
            Updated EntityRecord with canonical status, or None if
            promotion failed.
        """

    @abstractmethod
    def find_synonyms(
        self,
        entity_id: str,
    ) -> frozenset[str]:
        """
        Return all known surface forms for a canonical entity.

        Args:
            entity_id: A canonical entity ID.

        Returns:
            Frozenset of all surface forms associated with this entity.
        """

    @abstractmethod
    def merge(
        self,
        entity_id_a: str,
        entity_id_b: str,
    ) -> MergeResult:
        """
        Merge two entities determined to be the same.

        Calls the domain service to select the survivor. Updates all
        relationships referencing the non-survivor to reference the
        survivor. Records the merge in the merge log.

        Args:
            entity_id_a: First entity ID.
            entity_id_b: Second entity ID.

        Returns:
            MergeResult indicating which entity survived and whether
            the merge had already been performed.
        """

    @abstractmethod
    def on_entity_added(
        self,
        record: EntityRecord,
    ) -> None:
        """
        Hook called after any entity is added or updated.

        Used for downstream notifications, cache invalidation, and
        logging. Implementations should be fast and non-blocking;
        expensive downstream operations should be queued.

        Args:
            record: The EntityRecord that was added or updated.
        """

Domain Plugin HTTP Contract

The domain service implements five endpoints. The identity server calls these endpoints; the domain service fulfills them.

POST /resolve-authority

Request:

{
  "mention": "desmopressin",
  "entity_type": "drug"
}

Response (match found):

{
  "canonical_id": "RxNorm:3251",
  "authority": "RxNorm",
  "confidence": 1.0
}

Response (no match):

{
  "canonical_id": null,
  "authority": null,
  "confidence": null
}

POST /select-survivor

Request:

{
  "entity_a": { ... EntityRecord ... },
  "entity_b": { ... EntityRecord ... }
}

Response:

{
  "survivor_id": "RxNorm:3251"
}

POST /compute-confidence

Request:

{
  "provenance_records": [
    {
      "paper_id": "PMC1234567",
      "section_type": "results",
      "paragraph_idx": 3,
      "extraction_method": "claude-sonnet-4-6/v2",
      "confidence": 0.92,
      "study_type": "rct"
    }
  ]
}

Response:

{
  "confidence": 0.87
}

GET /synonym-criteria

Response:

{
  "fuzzy_threshold": 0.85,
  "embedding_threshold": 0.92,
  "entity_type_overrides": {
    "gene": {
      "fuzzy_threshold": 0.95
    }
  }
}

GET /schema

No request body. Returns the complete domain spec: the closed set of entity types and the full predicate vocabulary with domain, range, and constraint declarations. The identity server fetches this at startup and re-fetches it when the schema version changes.

Response:

{
  "version": "2.3.0",
  "entity_types": ["drug", "gene", "disease", "biological_process"],
  "predicates": [
    {
      "name": "treats",
      "domain": ["drug"],
      "range": ["disease"],
      "description": "Drug is used therapeutically to manage the disease.",
      "is_functional": false,
      "negation_of": null
    },
    {
      "name": "inhibits",
      "domain": ["drug", "gene"],
      "range": ["gene", "biological_process"],
      "description": "Subject suppresses the activity of the object.",
      "is_functional": false,
      "negation_of": "activates"
    }
  ]
}

Entity Status Rules

Current Status Operation Condition New Status
provisional promote authority match found canonical
provisional promote no authority match provisional (unchanged)
provisional merge --- merged
canonical merge --- merged (rare; requires manual override)
merged any --- error (operate on survivor)

Invariants:

  • A merged entity's entity_id is never returned by resolve
  • All relationships referencing a merged entity transparently resolve to the survivor
  • Merge is always between two non-merged entities

Idempotency Contract

Operation Idempotency Mechanism
resolve Upsert on (mention, entity_type); return existing ID if already resolved
promote Check canonical status before attempting; return existing record if already canonical
find_synonyms Read-only; always idempotent
merge Check merge log before executing; return existing MergeResult if already merged
on_entity_added Implementation responsibility; hook must be idempotent