Pre-labelled synthetic document libraries
Real-looking PDFs at scale. Ground truth, bounding boxes and scanned variants shipped alongside every document. Built by Root Cause Analytics in Sydney.
Real pages from the libraries
Browse representative documents from the RCA Insurance and Medical libraries. Same generator stack, different document types. Every page ships with ground truth, bounding boxes, a scanned variant, and a visible synthetic disclaimer.










The four product lines
Same generator stack. Different domain, different field schema, different style profile.
RCA Insurance Library
Complete commercial P&C submission packs. Broker emails, loss runs, statements of values, policy schedules, certificates of currency, applications, FNOL forms, claim reports.
Per-row labels on loss run and SOV. Engineered cross-document red flags.
Open the pageRCA Medical Library
Synthetic Australian medical records. 40+ document types: discharge, ED, referral, imaging, pathology and 35+ specialist types.
NSW conventions: postcodes, Medicare format, provider postnominals, AU clinician names.
Open the pageRCA Benchmark Packs
Smaller paid packs sized for procurement evaluation, pipeline QA, and pre-rollout review.
Insurance QA Sprint Pack ships at AUD $2,500 with same-day delivery.
Open the pageRCA Custom Libraries
Your document types, your field schema, your style profiles. Built deterministically and shipped with ground truth and bboxes.
Built deterministically from your schema. Same delivery shape as the standard libraries.
Open the pagePricing
Same four tiers per library: free sample, paid Sprint or Pilot pack, production library, training library. Every tier ships same-day. The generator is a deterministic Python pipeline that produces a complete library in minutes, not weeks.
RCA Insurance Library
| Tier | Size | Best for | Price |
|---|---|---|---|
| Free sample | 2 submission packs | First look. Review the schema and disclaimer. | Free |
| QA Sprint Pack | 10 submission packs + red flag summary + 30-min handover | Pipeline QA. Vendor evaluation. | AUD $2,500 |
| Production library | 100+ submission packs | Production regression suite. Internal QA at scale. | Contact for quote |
| Training library | 1,000+ submission packs with train / val / test splits | ML model fine-tuning at scale. | Contact for quote |
RCA Medical Library
| Tier | Size | Best for | Price |
|---|---|---|---|
| Free sample | 25 to 35 documents | First look. Review the schema, AU conventions and disclaimer. | Free |
| Pilot pack | 100 to 200 documents scoped to your specialty | Internal pilot. Specialty-focused review. | Contact for quote |
| Production library | 500 to 1,000 documents across 40+ types | Production regression suite. Internal QA at scale. | Contact for quote |
| Training library | 5,000+ documents with train / val / test splits | ML model fine-tuning at scale. | Contact for quote |
Every order ships same-day. The generator produces ground truth (CSV + JSONL), bounding box records, scanned variants, manifest, and train / val / test splits where applicable, all in one pass. Custom document types or schemas are quoted via RCA Custom Libraries.
What ships with every order
Every PDF lands with its labels. Every library lands with its manifest.
- Clean PDF in pdfs/
- Scanned variant in pdfs_scanned/ (rotation, noise, JPEG)
- Ground truth row in CSV and JSONL
- Bounding box record in bboxes.jsonl
- Visible synthetic disclaimer on every page
- manifest.json with document-type distribution and metadata
- splits.json with train / val / test allocation
- README.md with schema and regeneration commands
- validation_summary.md confirming integrity checks
- license_summary.md confirming the synthetic-only restriction
Try a free preview pack
Two-pack insurance preview, or a 25 to 35 document medical review pack.
Same-day delivery. Direct from Sydney, Australia.