Skip to main content
Self-hosted Docker container

RCA Extract

Self-hosted document extraction for healthcare PDFs. Ships as a Docker container that runs in your cloud or on-prem. POST a PDF, get back structured fields plus bounding boxes. Zero data egress.

Self-hosted
Docker container
Zero
Data egress
Australian
Healthcare conventions
REST API
PDF in, JSON out

How it deploys

A single container image. Your infrastructure. Your data stays where it is.

Pull the container

Pull the RCA Extract image from a private registry, signed and versioned per release. Runs on any Docker-compatible host. AMD64 and ARM64.

Run with a license key

docker run with a license env var and a volume mount for input PDFs. Health-checks via /healthz. The container exposes a REST API on a port you choose.

POST a PDF, get JSON

POST /extract with a PDF body. Get back the structured fields plus bounding boxes for the labeled fields. Same schema as the RCA Medical Library.

Land where you want

Write results to your warehouse, your EMR, your S3 bucket, your Postgres. The container does extraction. You own the data path.

Supported document types

The current production set covers high-volume Australian healthcare document types. Each is evaluated end-to-end against the matching family in the RCA Medical Library, so extraction quality can be scored against known ground truth. Additional document types ship on request.

Discharge summary

Demographics, registrar, consultant, principal diagnosis, principal ICD, dates, length of stay, medications, follow-up.

ED assessment

Triage category, presenting complaint, disposition, timings.

Referral letter

Referrer, recipient specialty, presenting problem, requested action.

Imaging report

Accession number, modality, body region, findings, impression.

Pathology report

Lab reference, specimen, test panel, result fields, abnormal flags.

Why a synthetic-first vendor

RCA Extract is built and tested against the same synthetic medical documents we sell as the RCA Medical Library. That gives us:

  • A controlled test set across 40+ document types where ground truth is known by construction.
  • A scanned-variant test set for photocopy and JPEG-noise robustness.
  • Versioned releases. Each release of RCA Extract is pinned to a generator seed and library version.
  • Transparent evaluation. If you want to verify our extraction quality before committing, we can ship you the same documents and you score against the same ground truth.

We do not publish blanket accuracy numbers until we have published benchmark methodology and results. If you need a benchmark for a specific document type, contact us.

What you get with a self-hosted deployment

RCA Extract runs as a Docker container inside your own cloud account, your own Kubernetes cluster, or on-prem on any Docker-compatible host. Your infrastructure, your compute bill, your data residency story.

  • Runs entirely inside your cloud or on-prem. Patient data never leaves your environment.
  • Customer-managed compute. You pay your cloud bill, not ours.
  • No external API calls during extraction. No third-party data processors involved.
  • Inherits your existing RBAC, audit and network policies.
  • Encryption at rest and in transit, provided by your infrastructure.

Other deployment shapes (managed API for teams without their own infrastructure, air-gapped builds for high-security environments) are available on request as part of an enterprise plan. Contact us to scope.

Runs in your cloud or on-prem
No third-party processors
FHIR-aligned output
Zero data egress

Pricing

Per-seat or per-month, not per-page. You control the compute. Contact us with your scope: volume, document types, deployment shape and SLA. A pilot pack from the RCA Medical Library is the recommended first step.

Talk to us about deployment

Self-hosted container in your cloud, on-prem, or air-gapped. Managed API available on enterprise plans.