RCA Extract
Self-hosted document extraction for healthcare PDFs. Ships as a Docker container that runs in your cloud or on-prem. POST a PDF, get back structured fields plus bounding boxes. Zero data egress.
How it deploys
A single container image. Your infrastructure. Your data stays where it is.
Pull the container
Pull the RCA Extract image from a private registry, signed and versioned per release. Runs on any Docker-compatible host. AMD64 and ARM64.
Run with a license key
docker run with a license env var and a volume mount for input PDFs. Health-checks via /healthz. The container exposes a REST API on a port you choose.
POST a PDF, get JSON
POST /extract with a PDF body. Get back the structured fields plus bounding boxes for the labeled fields. Same schema as the RCA Medical Library.
Land where you want
Write results to your warehouse, your EMR, your S3 bucket, your Postgres. The container does extraction. You own the data path.
Supported document types
The current production set covers high-volume Australian healthcare document types. Each is evaluated end-to-end against the matching family in the RCA Medical Library, so extraction quality can be scored against known ground truth. Additional document types ship on request.
Discharge summary
Demographics, registrar, consultant, principal diagnosis, principal ICD, dates, length of stay, medications, follow-up.
ED assessment
Triage category, presenting complaint, disposition, timings.
Referral letter
Referrer, recipient specialty, presenting problem, requested action.
Imaging report
Accession number, modality, body region, findings, impression.
Pathology report
Lab reference, specimen, test panel, result fields, abnormal flags.
Why a synthetic-first vendor
RCA Extract is built and tested against the same synthetic medical documents we sell as the RCA Medical Library. That gives us:
- A controlled test set across 40+ document types where ground truth is known by construction.
- A scanned-variant test set for photocopy and JPEG-noise robustness.
- Versioned releases. Each release of RCA Extract is pinned to a generator seed and library version.
- Transparent evaluation. If you want to verify our extraction quality before committing, we can ship you the same documents and you score against the same ground truth.
We do not publish blanket accuracy numbers until we have published benchmark methodology and results. If you need a benchmark for a specific document type, contact us.
What you get with a self-hosted deployment
RCA Extract runs as a Docker container inside your own cloud account, your own Kubernetes cluster, or on-prem on any Docker-compatible host. Your infrastructure, your compute bill, your data residency story.
- Runs entirely inside your cloud or on-prem. Patient data never leaves your environment.
- Customer-managed compute. You pay your cloud bill, not ours.
- No external API calls during extraction. No third-party data processors involved.
- Inherits your existing RBAC, audit and network policies.
- Encryption at rest and in transit, provided by your infrastructure.
Other deployment shapes (managed API for teams without their own infrastructure, air-gapped builds for high-security environments) are available on request as part of an enterprise plan. Contact us to scope.
Pricing
Per-seat or per-month, not per-page. You control the compute. Contact us with your scope: volume, document types, deployment shape and SLA. A pilot pack from the RCA Medical Library is the recommended first step.
Talk to us about deployment
Self-hosted container in your cloud, on-prem, or air-gapped. Managed API available on enterprise plans.