Error Handling¶

Placeholder — this page needs to be written.

Ingestion pipelines fail in predictable ways: LLMs return malformed JSON, authority lookups time out, documents are corrupt. This page covers how the framework handles partial failures without losing work.

Failure modes¶

Parse failure — the document cannot be parsed (wrong format, encoding error).
Extraction failure — the LLM returns output that does not validate against the expected schema.
Resolution failure — the authority lookup times out or returns no result.
Embedding failure — the embedding model is unavailable.

Design principles¶

Partial progress is preserved. If pass 1 completes but pass 2 fails halfway through, the completed work is not discarded. The pipeline can resume from the last checkpoint.
Failures are recorded, not swallowed. Each failure is written to an error log with the document ID, stage, and exception. Silent failures are not acceptable.
Retries are bounded. Transient failures (network timeouts) are retried with exponential backoff. Persistent failures are surfaced after a configurable retry limit.

Extraction validation and fallback¶

When the LLM returns output that fails Pydantic validation:

Log the raw output and the validation error.
Optionally retry with a repair prompt.
If repair fails, mark the chunk as extraction_failed and continue.

Chunks marked extraction_failed are included in the run report so they can be reviewed and reprocessed.

Resumable runs¶

The pipeline writes a checkpoint file after each document completes. On restart, already- processed documents are skipped. This makes large ingestion runs safe to interrupt.

Error Handling¶

Failure modes¶

Design principles¶

Extraction validation and fallback¶

Resumable runs¶

See also¶