Architecture · v2

HIPAA-aware medical coding

The portfolio demo is hardcoded so it never breaks and never touches PHI. Below is the production path it stands in for — the choices a real deployment would make to handle protected health information legally and safely. HIPAA-aware, not HIPAA-certified: there is no such certification body, and any real claim of compliance comes from signed BAAs and SOC 2 / HITRUST audits, not a marketing label.

Rule 0: PHI never leaves a BAA-covered endpoint.

No pasting notes into public ChatGPT, Gemini, Grok, or Claude.ai. If a tool that touches the request body doesn't have a signed BAA, the only thing that can reach it is data run through the Safe Harbor de-identification step first.

Data flow

EHR egress. Encounter notes leave the EHR over a TLS 1.2+ link — typically FHIR DocumentReference with the note as a Binary attachment, or HL7 v2 MDM_T02 for older systems.
Ingest gateway. Receives the note inside the customer-tenanted VPC, validates payload, stamps a correlation ID, writes an immutable audit record, and stores the raw note encrypted at rest (AES-256, customer-managed KMS key).
PHI minimization. Strip the 18 Safe Harbor identifiers (§164.514(b)(2)) before the prompt: names, geographic subdivisions smaller than state, dates more precise than year, phone, fax, email, SSN, MRN, account, plan number, device identifiers, URLs, IPs, biometric IDs, photos, and any other unique identifying number or code. The model sees the clinical narrative plus age band, sex, and visit type — not the patient identity.
Model serving (BAA gate). Calls go to a foundation model under a signed BAA — Gemini via Google Vertex AI (HIPAA-eligible under a Google Cloud BAA), Claude via AWS Bedrock, GPT-class via Azure OpenAI Service, or a self-hosted open model (Llama-3, Mistral, MedPaLM-style fine-tune) inside the VPC. The public Google AI Studio / Anthropic / OpenAI APIs without a BAA are not a permissible path. The demo on this site uses Google AI Studio because its inputs are synthetic; the only change to run in production is the SDK endpoint and a signed BAA.
Code lookup & rule engine. The model proposes candidate codes; a deterministic post-processor validates each against current CPT / ICD-10-CM / HCPCS catalogs, runs NCCI edits (PTP and MUE tables), and applies payer-specific LCDs. Rule violations become flags shown to the human reviewer.
Human-in-the-loop review. Every AI-suggested code is validated by a certified coder before claim submission — AI suggests, a human decides. The reviewer sees the supporting evidence span and chooses accept, reject, or edit— the same shape as the demo's decision panel and the worksheet pattern in kataloghub-app/app/api/corrections/worksheet/route.ts. Rejected suggestions are logged back to the vendor for accuracy tuning.
Push back to EHR. Final code set posts back via FHIR Claim / ChargeItem or HL7 v2 DFT_P03, the encounter status updates, and a billing job picks it up. Every read, write, decision, and outbound call is in the audit log.

The HIPAA controls that actually matter

Business Associate Agreement

Signed BAA with the covered entity before any PHI moves; cascading BAAs with the cloud provider and the model provider. A vendor that won't sign is disqualified — full stop.

Encryption in transit & at rest

TLS 1.2+ everywhere, AES-256 at rest with customer-managed KMS keys. Object-level keys for note storage so a breach radius is one record, not a bucket.

Access controls & audit logs

RBAC with MFA, least-privilege IAM. Every PHI read/write/decision logged with user, timestamp, and correlation ID — but log records reference a hashed correlation ID, never the note body. Logs themselves write-once, append-only.

BAA-gated model serving

Inference goes to Bedrock / Azure OpenAI under BAA, or a self-hosted model in a HIPAA-eligible VPC. Public Anthropic / OpenAI APIs are off-limits for PHI.

Data minimization

Send only the clinical narrative + minimal demographics. Strip MRN, address, account numbers before the prompt. The model never needs the patient's identity to suggest a code.

SOC 2 Type II + HITRUST

Self-attested HIPAA + third-party audited security controls. Penetration testing, employee training, breach notification procedures. The paperwork is the moat.

BAA: the clauses that actually matter

“HIPAA-eligible” is a vendor checkbox. The BAA is where the eligibility is enforced. Four clauses I look for before signing:

No training on PHI. The vendor cannot use customer prompts, completions, or any payload-derived data to train, fine-tune, or evaluate models — for this customer or any other.
No retention after task completion. PHI is held only as long as needed to return the response, then deleted. Any cache (prompt cache, KV cache, batch buffer) is scoped to the request and purged.
No commingling across tenants. Tenant data is logically isolated; no shared embeddings store, no shared evaluation set, no “learn from all customers” feature.
No PHI in vendor logs. Whatever the vendor logs for debugging, abuse detection, or analytics excludes request/response bodies — or hashes them. PHI in a log line is still a breach.

If a vendor won't put these in writing, the tool is for synthetic or de-identified data only. That covers prototyping; it does not cover production.

Ongoing compliance

Annual security risk assessment. Per §164.308(a)(1)(ii)(A), the SRA runs at minimum yearly — re-map data flows, re-check access controls, re-verify that every PHI hop still ends at a BAA-covered endpoint.
Re-assess when a vendor turns on an AI feature. A coding tool that adds an “AI assist” toggle six months after contract signing is a new data flow, even if the vendor calls it an “enhancement.” Trigger an out-of-cycle SRA and confirm the BAA still covers the new processing path.
BAA refresh on vendor change. Sub-processor list changes, new AI capabilities, new data residency — any of these require revisiting the agreement, not waiting for the renewal date.
Reject-flag feedback loop. Coder rejections and edits feed back to the vendor as accuracy signal — without sending the underlying PHI. The signal is “code X was wrong in context Y,” not the note.

From this public demo to a private BAA testbed

The site you're reading is a public demo on synthetic data. If a prospect signs a BAA and wants to evaluate against real charts, the live data does not go through medi.usesmpt.com. The codebase deploys a second time into a private environment with different config. The deltas:

Concern	Public demo (this URL)	Private testbed (post-BAA)
Host	Hostinger shared VPS — no HIPAA BAA available	AWS / GCP / Azure region under a signed BAA
Model endpoint	Public Gemini API (not BAA-eligible)	Vertex AI, Bedrock, or Azure OpenAI under BAA
PHI guard	Safe Harbor detector blocks submission	Detector still runs, but as audit-only logging — PHI is permitted
Banner	“Synthetic data only — no PHI”	“Authorized BAA testing · Client: ⟨name⟩ · Engagement: ⟨id⟩”
Access	Open to the public internet	IP allow-list, basic auth or SSO, optionally mTLS
Audit log	Dev-only console output	Append-only log: timestamp, user, hashed correlation ID, outcome — never the note body
Data retention	No persistence — request-scoped	Configurable per engagement (default: no persistence; opt-in encrypted store for QA review)
URL	medi.usesmpt.com	medi-private.⟨client⟩.usesmpt.com or per-client subdomain

The codebase is one deploy target; the surrounding infrastructure is what changes. Standing up the private testbed is a config swap and a DNS record, not a rewrite — typically a one-day setup after BAA execution.

Honest limitations of this prototype

No live model. Suggestions are a hardcoded mapping over a handful of synthetic notes. The UX, decision shape, and rule-engine flag pattern are real; the inference is not.
No real EHR integration."Send to EHR" is a state transition in React, not a FHIR / HL7 call. A production build would target Epic, Cerner/Oracle, Athena, or eClinicalWorks — each its own integration project.
No live NCCI / LCD lookup. Flag messages are written into the sample data. A real system would query a current edits table updated quarterly.
Synthetic notes only. All sample notes were written for the demo in MTSamples style. No real PHI is, has been, or will be processed by this app.
Single specialty. Only outpatient E/M is modeled. Specialty-specific code sets (radiology, dermatology, anesthesia) each need their own evaluation.

What this demonstrates

The model can call APIs — that's the easy part. What separates a credible build from a demo screenshot is: a working evidence-grounded suggestion UI, an explicit decision shape (accept / reject / edit), a rule engine that surfaces NCCI and documentation flags rather than burying them, and an architecture that names where the PHI goes at every hop. That's the gap I'm optimizing the demo and this page to close.