Adapting to Your Domain¶

Placeholder — content to be migrated and expanded from ../adapting-to-your-domain.md.

Step-by-step guide to implementing the framework for a new domain.

Step 1: Define your schema¶

Implement DomainSchema with your entity types, relationship types, and document type. Start minimal — add types as you discover them during extraction experiments, not upfront.

See Schema Design Guide.

Step 2: Write extraction prompts¶

Write entity and relationship extraction prompts tailored to your domain's vocabulary and document structure. Test them manually on a sample of documents before wiring them into the pipeline.

See Prompt Design for Extraction.

Step 3: Implement pipeline components¶

Implement the parser, extractor, and resolver interfaces for your document format and authority sources. Use the Sherlock or medlit examples as reference.

Step 4: Seed the synonym cache¶

If your domain has a known controlled vocabulary (MeSH headings, drug name lists, etc.), seed the synonym cache before running ingestion. This dramatically improves resolution accuracy for the first run.

Step 5: Run and validate¶

Run pass 1 on a small document set. Inspect:

Entity extraction rate (mentions per chunk).
Resolution rate (what fraction resolved to canonical IDs).
Provisional entity count (high count → authority lookup gaps or prompt issues).

Then run pass 2 and inspect relationship counts and types.

Step 6: Iterate¶

Adjust prompts, add entity types, or extend the synonym cache based on inspection. Re-run on the validation set. Repeat until quality is acceptable.