RCA Insurance Library
Synthetic commercial P&C submission packs
Pre-labelled broker submissions for QA, evaluation and training of document extraction pipelines. Built and shipped by Root Cause Analytics.
Per-row labels, not just per-document
Other synthetic libraries give you one box per document. The RCA Insurance Library labels every field on the page, plus every individual claim row on a loss run and every individual location row on a statement of values.
That means a row-level extractor gets row-level supervision. A reviewer can click any row in the ground truth and highlight the exact pixels on the rendered PDF. A vendor evaluation scores every extractor on the same row-level target.


Same shape on statements of values
Per-location rows from location_rows_json get the same treatment. Each address, occupancy, building value, contents value, stock value and BI value lands as its own bbox keyed by row index. A statement of values with five sites ships roughly 41 labelled-field bboxes.


Pricing
Every tier ships same-day. The generator is a deterministic Python pipeline that produces a full library in minutes.
| Tier | Size | Best for | Price |
|---|---|---|---|
| Free sample | 2 submission packs | First look. Review the schema, bboxes and disclaimer in real documents. | Free |
| QA Sprint Pack | 10 submission packs + engineered red flag summary + 30-minute handover call | Pipeline QA against a controlled, varied input set. Vendor evaluation. | AUD $2,500 |
| Production library | 100+ submission packs | Production regression suite. Internal QA at scale. | Contact for quote |
| Training library | 1,000+ submission packs with train / val / test splits | ML model fine-tuning at scale. Layout-model training. | Contact for quote |
What is in the library
Each submission pack is a complete broker submission as you would receive it in a real underwriting inbox: cover note, attachments, supporting forms. Pack composition varies by submission type (new business, renewal with claims, FNOL).
Broker submission email
Cover note, named attachments, broker signature block
Loss run report
Last 5 years of claims, per-claim rows, displayed totals, status
Statement of values
Per-location rows, building, contents and BI values, displayed totals
Policy schedule
Insurer schedule with limits, deductibles, endorsements
Certificate of currency
Broker-issued confirmation of cover
Insurance application
New business questionnaire
FNOL form
First notice of loss form
Claim report
Incumbent renewal claim narrative
Engineered red flags
A subset of packs are deliberately broken: cross-document inconsistencies we have seen in real submissions, engineered in at known positions so your extraction or validation pipeline has a controlled target to flag.
Loss run total mismatch
Displayed total disagrees with the sum of the claim rows
Statement of values total mismatch
Displayed total disagrees with the sum of the location rows
Missing attachment
The broker email lists a doc that is not in the pack
ABN formatting inconsistency
Same ABN formatted differently across documents in the same pack
Policy number mismatch
Certificate of currency disagrees with the policy schedule
Location address mismatch
Statement of values address disagrees with the policy schedule
Claim after policy end
A loss date is outside the policy period
Currency mismatch
A non-AUD currency on a single location row inside an otherwise AUD submission
Red flag inventory ships as red_flags_summary.csv with each pack. The CSV includes a where_to_review column pointing to the two documents to compare. This file is the most useful artefact for QA workflows.
Diversity controls
Each PDF is rendered with a deterministically chosen style profile, each modelled on a real underwriting-inbox archetype:
Each document type has three named template families that vary header / footer / section ordering without changing field labels or ground truth values. The chosen profile and family are recorded per row in the ground truth.
Synthetic safety
Every PDF carries a visible synthetic disclaimer on every page. All broker names, insurer names, insured business names, ABNs, addresses, phone numbers, policy numbers, claim numbers and dollar values are computer-generated and do not refer to any real organisation, broker, insurer or claim.
Not for underwriting, claims handling, accounting, or regulatory use.
Try the free 2-pack preview
Two complete submission packs, ground truth, bboxes and scanned variants. The pack ships with a five-minute review path.