Skip to main content
Sydney, NSW, Australia

About Root Cause Analytics

Root Cause Analytics builds document extraction products and pre-labelled synthetic document libraries for teams working with healthcare, insurance and other privacy-sensitive documents. RCA Extract (formerly MEDISCAN) is our self-hosted extraction container. The RCA libraries are the test data the product is built against, sold separately.

Making document data work

Healthcare and insurance organisations process millions of paper and digital documents each year. Discharge summaries, referral letters, pathology reports, broker submissions, policy schedules. The information locked inside these documents has enormous operational value, yet most of it remains inaccessible because extracting it manually is too slow, too expensive, and too error-prone.

We built RCA Extract to change that for healthcare PDFs, and we built the RCA Medical and Insurance libraries to make sure RCA Extract (and other extraction pipelines) have somewhere safe to be tested.

RCA Extract runs as a self-hosted Docker container inside the customer's own environment. The libraries ship as direct downloads with ground truth, bounding boxes and scanned variants for every document. See how RCA Extract works

RCA Extract
Extraction product
Self-hosted Docker container. Runs in your cloud or on-prem. Zero data egress.
40+ types
Medical library
Discharge, ED, referral, imaging, pathology, plus 35+ specialist types.
Per-row bboxes
Insurance library
Per-claim and per-location row bbox structure for granular extraction QA.
AU conventions
Sydney-based
NSW postcodes, Medicare format, AU provider postnominals built in.

What we stand for

Our values guide every product decision, from architecture choices to pricing models.

Patient-First Design

Every feature is designed with the downstream impact on patient care in mind. Better data quality leads to better clinical decisions.

Security by Architecture

We chose a zero data movement architecture not as a feature - but as a foundational design principle. Patient data stays where it belongs.

Evidence Over Marketing

We publish per-document-type evaluation alongside benchmark releases rather than headline accuracy numbers. Numbers without methodology do not help buyers.

Simplicity at Scale

Enterprise data challenges should not require enterprise-scale implementation projects. A self-hosted container that runs in your environment reflects that belief.

Founder-led

Root Cause Analytics is a specialist document AI and healthcare data company based in Sydney, Australia.

Jack Webb

Founder & Lead Data Engineer

Builds Root Cause Analytics from Sydney. Background in healthcare data engineering. Direct contact below.

jack.webb@rootcauseanalytics.com.au

Technical capabilities

The product line combines healthcare-specific OCR and NLP, deployed as a self-hosted container, alongside synthetic training document libraries used internally for validation and sold externally for QA.

  • Healthcare-specific OCR fine-tuned on clinical documents
  • Self-hosted container for zero data egress
  • FHIR-aligned output schemas for interoperability
  • Synthetic training document libraries shipped with ground truth and bounding boxes
  • Deterministic generators, reproducible by seed
  • AU-specific document conventions: NSW postcodes, Medicare format, provider postnominals

Security & Synthetic Safety

RCA Extract runs in your own environment
Self-hosted container. Zero data egress.
No third-party data processors
No external API calls during extraction
Customer-managed Access Controls
Uses your RBAC and audit logs
Library outputs are synthetic only
Visible disclaimer on every page

Get in touch

Request a free preview pack from one of the libraries, talk to us about deploying RCA Extract in your environment, or reach out about a custom library scoped to your document types.