Error Handling¶
Placeholder — this page needs to be written.
Ingestion pipelines fail in predictable ways: LLMs return malformed JSON, authority lookups time out, documents are corrupt. This page covers how the framework handles partial failures without losing work.
Failure modes¶
- Parse failure — the document cannot be parsed (wrong format, encoding error).
- Extraction failure — the LLM returns output that does not validate against the expected schema.
- Resolution failure — the authority lookup times out or returns no result.
- Embedding failure — the embedding model is unavailable.
Design principles¶
- Partial progress is preserved. If pass 1 completes but pass 2 fails halfway through, the completed work is not discarded. The pipeline can resume from the last checkpoint.
- Failures are recorded, not swallowed. Each failure is written to an error log with the document ID, stage, and exception. Silent failures are not acceptable.
- Retries are bounded. Transient failures (network timeouts) are retried with exponential backoff. Persistent failures are surfaced after a configurable retry limit.
Extraction validation and fallback¶
When the LLM returns output that fails Pydantic validation:
- Log the raw output and the validation error.
- Optionally retry with a repair prompt.
- If repair fails, mark the chunk as
extraction_failedand continue.
Chunks marked extraction_failed are included in the run report so they can be
reviewed and reprocessed.
Resumable runs¶
The pipeline writes a checkpoint file after each document completes. On restart, already- processed documents are skipped. This makes large ingestion runs safe to interrupt.