Confidence aggregation

Confidence Aggregation¶

The medlit domain service computes composite confidence from a list of provenance records using a replication-weighted mean:

from pydantic import BaseModel
from typing import Literal

StudyType = Literal[
    "meta_analysis", "rct", "cohort",
    "case_control", "observational", "review", "case_report"
]

STUDY_WEIGHTS: dict[StudyType, float] = {
    "meta_analysis": 0.95,
    "rct": 1.0,
    "cohort": 0.8,
    "case_control": 0.7,
    "observational": 0.6,
    "review": 0.5,
    "case_report": 0.4,
}

REPLICATION_BONUS_PER_PAPER = 0.02
MAX_REPLICATION_BONUS = 0.15

class ProvenanceRecord(BaseModel):
    paper_id: str
    section_type: str
    paragraph_idx: int
    extraction_method: str
    confidence: float
    study_type: StudyType

def compute_confidence(records: list[ProvenanceRecord]) -> float:
    if not records:
        return 0.0
    base = max(
        r.confidence * STUDY_WEIGHTS[r.study_type]
        for r in records
    )
    replication_bonus = min(
        (len(records) - 1) * REPLICATION_BONUS_PER_PAPER,
        MAX_REPLICATION_BONUS,
    )
    return min(base + replication_bonus, 0.99)

The base confidence is the maximum weighted confidence across all supporting records -- the strongest single piece of evidence sets the floor. The replication bonus rewards claims that appear in multiple independent papers, capped to prevent a large number of weak case reports from inflating a claim beyond what the evidence supports.