Skip to main content
Synthetic. Not real patient data.

RCA Medical Library

Synthetic Australian medical training documents

40+ document types across hospital, ED, GP clinic, pathology, imaging and specialist correspondence. Ground truth, bounding boxes and scanned variants shipped with every document.

40+ document types

Three groups. The five document types covered by RCA Extract are starred.

Hospital and ED

  • Discharge summaryRCA Extract
  • ED assessmentRCA Extract
  • Admission checklist
  • ICU daily plan
  • Anaesthetic record
  • Fluid order
  • Progress note
  • Patient safety checklist
  • Transfusion compatibility report
  • Haemodialysis flow sheet
  • Infusion pump checklist
  • Medication administration record

GP and primary care

  • Referral letterRCA Extract
  • Medical certificate
  • Prescription
  • Mental health care plan
  • Mental health assessment
  • Advance care directive
  • Home care plan
  • Treatment plan
  • External correspondence

Pathology, imaging, specialist

  • Pathology request
  • Pathology reportRCA Extract
  • Imaging request
  • Imaging reportRCA Extract
  • Bone density report
  • ECG (12-lead and rhythm)
  • Echo report
  • Vascular ultrasound report
  • Pacemaker report
  • Ophthalmology assessment
  • Audiology assessment
  • Speech pathology assessment
  • Physiotherapy assessment
  • Endoscopy report
  • HADS questionnaire

Full list with document_type weights documented in the library manifest.json.

Real sample, not a mock-up

What you actually get

Below is a real discharge summary from the RCA Medical Library: NSW hospital header, AU patient name conventions, Medicare format, NSW Local Health District, AU consultant postnominals, and a medications table. Same page rendered clean and with every labelled field outlined.

discharge_summary
clean PDF
Clean synthetic discharge summary from a fictional NSW hospital. Patient demographics with AU name and NSW address, Medicare number in the displayed AU format, ward, consultant, principal diagnosis, additional diagnoses list, hospital course narrative, and medications table. Visible synthetic disclaimer at the bottom.
labelled fields overlay
24 bboxes
The same discharge summary with every labelled field outlined: patient name, date of birth, MRN, Medicare number, NOK, allergies, admission and discharge dates, ward, consultant, principal diagnosis. Each box maps directly to a column in ground_truth.csv.
Field outlines
Each red rectangle maps to one column in ground_truth.csv. Mean: 15 fields per document across the library.
AU conventions
NSW hospital name, NSW Local Health District, AU patient and consultant naming, displayed Medicare format, TRN-PROV-XXXXX provider numbers.
Ships alongside
CSV + JSONL ground truth, bboxes.jsonl, manifest, plus a scanned variant of every PDF.
Request the free review pack

25 to 35 representative documents. Same-day delivery on request. PDFs, ground truth, bboxes and scanned variants.

AU-specific realism

  • Patient names use AU-common first names and surnames drawn from broad surname pools (not a single ethnicity).
  • Addresses use NSW postcodes that match the stated suburb. Postcode-to-suburb mapping is sourced from public ABS data and is computer-generated; no real residential address is referenced.
  • Medicare numbers follow the displayed AU format (10 digits plus IRN) but are computer-generated and do not validate against the real Medicare system.
  • Provider numbers use the TRN-PROV-XXXXX format with the synthetic TRN prefix. The TRN prefix is deliberate so any pipeline that ingests these documents can filter out synthetic provider numbers.
  • Clinician postnominals use AU specialty fellowships: FRACGP, FRACP, FRCPA, FRANZCR, FACEM, FRACS.
  • Hospitals carry NSW Local Health District labels (synthetic, not real LHD names).
  • Phone numbers use AU area codes.

These conventions are commonly the source of extraction failures on models trained primarily on US-only documents. Models that handle US date formats, US ZIP codes and DEA numbers will frequently fail on AU postcodes, Medicare numbers and provider numbers without retraining.

65+ curated clinical case archetypes

The Medical Library is built from hand-authored case archetypes. Each case has internally consistent demographics, presenting complaint, labs, treatments, follow-up plans and discharge instructions. A single case can be rendered as several different document types within the same library so the documents in a pack hang together as a plausible patient journey.

Adding a new case is roughly 50 lines of Python. We accept paid feature requests for new case archetypes. Common requests: paediatric ED, renal failure with dialysis, post-op infection, mental health crisis presentation.

Diversity controls

Eight style profiles ship today:

nsw_hospital_lettergp_clinic_letterpathology_lis_reportimaging_ris_reported_system_printoutdischarge_summary_emrspecialist_clinic_letterfaxed_external_correspondence

Each document type has three named template families that vary header / footer / section ordering without changing field labels or ground truth values. Visible synthetic disclaimer placement varies per document: footer line, top banner, boxed notice, or pale strip.

Pricing

TierScalePriceDelivery
Free review pack25 to 35 documentsFree for qualified prospectsSame day on request
QA library200 documentsOn requestScoped per order
Training library500, 5,000+ documentsOn requestScoped per order
Pilot Pack100 to 200 docs scoped to your use caseOn requestScoped per order
Custom variantsNew document types, new case archetypesOn requestScoped per order

Synthetic safety

Every PDF carries a visible synthetic disclaimer on every page. Patient names, dates of birth, Medicare numbers, MRNs, addresses, phone numbers, clinician names, provider numbers and hospital names are computer-generated and do not refer to any real person or organisation.

Not for clinical care, coding, billing, or regulatory use.

Get the free 25-doc review pack

25 to 35 representative medical documents. Five-minute review path. Free for qualified prospects.