Adapting to Your Domain¶
Placeholder — content to be migrated and expanded from
../adapting-to-your-domain.md.
Step-by-step guide to implementing the framework for a new domain.
Step 1: Define your schema¶
Implement DomainSchema with your entity types, relationship types, and document type.
Start minimal — add types as you discover them during extraction experiments, not upfront.
See Schema Design Guide.
Step 2: Write extraction prompts¶
Write entity and relationship extraction prompts tailored to your domain's vocabulary and document structure. Test them manually on a sample of documents before wiring them into the pipeline.
See Prompt Design for Extraction.
Step 3: Implement pipeline components¶
Implement the parser, extractor, and resolver interfaces for your document format and authority sources. Use the Sherlock or medlit examples as reference.
Step 4: Seed the synonym cache¶
If your domain has a known controlled vocabulary (MeSH headings, drug name lists, etc.), seed the synonym cache before running ingestion. This dramatically improves resolution accuracy for the first run.
Step 5: Run and validate¶
Run pass 1 on a small document set. Inspect:
- Entity extraction rate (mentions per chunk).
- Resolution rate (what fraction resolved to canonical IDs).
- Provisional entity count (high count → authority lookup gaps or prompt issues).
Then run pass 2 and inspect relationship counts and types.
Step 6: Iterate¶
Adjust prompts, add entity types, or extend the synonym cache based on inspection. Re-run on the validation set. Repeat until quality is acceptable.