Corpus ingestion pipeline
designed, unverifiedDocument intake with source registration, text extraction, chunking, and embedding. Each ingested document receives a content address, a source provenance record, and an export-control classification tag. No document enters the corpus without registration.
spec: 04-proposal-alignment §4 corpus pipeline
Source registry
designed, unverifiedAuthoritative index of all retrievable sources: document ID, origin, date, classification, export-control posture, and retrieval authorization. The registry is the gate — unregistered sources are invisible to the retrieval surface.
design doc: source registry schema
Retrieval engine
designed, unverifiedHybrid retrieval combining keyword (FTS5) and semantic (vector) search. Retrieval is bounded by document-count caps and source-registry authorization. Every retrieval produces a manifest: which sources were searched, which were returned, and which were excluded and why.
design doc: retrieval surface spec
Cited composer
designed, unverifiedStructured response generation with mandatory source citations. The composer is constrained: no claim without a supporting passage, no synthesis across incompatible sources, no filling gaps with parametric knowledge. Output is a cited answer, not a free-form generation.
design doc: composer abstention spec
Output verifier
designed, unverifiedPost-composition verification pass: does every claim have a cited source? Does every citation resolve to a passage in the retrieval manifest? Are confidence signals consistent with the underlying evidence? Failed verification returns the output for recomposition.
design doc: verifier gate spec
Gold-set evaluator
plannedOffline evaluation against curated gold-answer sets. Measures retrieval precision/recall, answer groundedness, citation accuracy, and abstention correctness. Evaluation runs are reproducible — same gold set, same model, same config produces same scores.
design doc: evaluation framework spec
Receipt ledger
designed, unverifiedAppend-only, content-addressed record of every query run: query hash, retrieval manifest, composer version, verifier verdict, output hash, and timestamp. The ledger is the audit surface — it proves what happened, not what the system claims happened.
design doc: receipt ledger spec