An AI Claims Specialist for a process that's 40% broken.
A GenAI persona-design case from the EMA APM application. US healthcare claims processing is slow, error-prone, and built on a regulatory base that shifts under your feet. Here's how I'd design an AI specialist that cuts processing time in half — without skipping the human in the loop.
Three forces converging in one workflow.
US healthcare claims processing is exactly the kind of problem GenAI was built for — and exactly the kind of problem that punishes you for being naive about it. Three structural forces make it ripe right now:
- The work is dense, repetitive, and language-heavy. Reading medical records, mapping them to ICD-10 and CPT codes, filling forms, drafting appeals — it's structured-output generation at industrial scale.
- The cost of error is asymmetric. A wrong code doesn't just delay a claim; it triggers rework, appeals, and frustrated patients. Specialists spend more time fixing rejections than processing new claims.
- The rules shift faster than the people can keep up. CMS guidelines, payer-specific rules, state-level variations. Maintaining a current playbook is a full-time job in itself.
An AI specialist isn't replacing the humans here. It's giving them the time to be humans about the cases that actually need judgement.
The numbers tell the story.
The headline statistic is the one I keep coming back to:
Forty percent rework means that roughly four in every ten claims a specialist touches comes back. Rework cycles can take days; appeals can take weeks. Patients sit in financial limbo. Providers absorb the cash-flow hit. Payers spend cycles on disputes. Everyone loses, slowly.
The Claims Processing Specialist.
Before designing the AI, I had to understand the human it would work alongside. The target user is a Claims Processing Specialist at a mid-to-large healthcare provider or third-party administrator.
Maya · 34 · Claims Processor
Processes 60–80 claims a day. Spends ~30% of her time on rework. Frustrated by payer-specific rule changes that aren't documented anywhere central. Wants to focus on the genuinely complex cases.
David · 41 · Senior Specialist + Trainer
Trains new specialists and handles escalations. Sees the same coding errors every onboarding cycle. Wants a tool that codifies institutional knowledge instead of requiring 6 months of tribal learning.
What they share
Both want to spend less time on the mechanical work — coding, form-filling, status-checking — and more time on the parts that actually require their expertise: ambiguous diagnoses, payer disputes, complex multi-procedure claims. The AI specialist isn't competing with them. It's clearing their desk.
What the persona actually does.
The EMA AI Claims Processing Specialist is a persona-driven agent — meaning the LLM isn't a generic chat surface, it's a configured role with a defined scope, tools, and escalation rules. Five core capabilities:
- Patient data extraction (NLP). Pulls structured fields from uploaded medical records — patient demographics, diagnosis narratives, procedure notes, dates of service. Built to handle the messy, free-text reality of clinical documentation.
- Auto-coding (ICD-10 + CPT). Maps extracted diagnoses and procedures to the right codes. Surfaces low-confidence assignments for human review rather than silently guessing.
- Form population. Fills out claim forms (CMS-1500, UB-04, payer-specific variants) with the extracted and coded data. Flags missing required fields before submission, not after rejection.
- Real-time error check. Validates the claim against current payer rules and CMS guidelines before submission. The 40% rework rate exists largely because this check happens after the fact today.
- Insurer query responses. When a payer comes back with a request for additional information, the AI drafts a response from the underlying records — specialist reviews and signs off.
The principle
The AI does the repetitive work. The specialist reviews and approves. Nothing leaves the building without a human sign-off, especially in a regulated, high-stakes domain like this.
From patient record to paid claim.
Step 1 — Intake
Specialist uploads patient records (PDF, scanned image, or pulled directly from EMR). AI parses, extracts, and structures the data.
Step 2 — Coding
AI proposes ICD-10 and CPT codes with confidence scores. Specialist sees a side-by-side view: extracted diagnosis narrative on the left, proposed code(s) on the right, with rationale. Low-confidence items are highlighted in orange — the human starts there.
Step 3 — Form population
AI auto-fills the appropriate claim form. Missing required fields flagged at the top. Specialist reviews, adjusts, approves.
Step 4 — Pre-submission validation
AI runs the populated claim against an internal rules engine — payer-specific requirements, CMS guidelines, common rejection patterns. Surfaces issues in plain English: "This payer requires modifier 59 for these CPT pairs." Specialist fixes, AI re-validates.
Step 5 — Submission and tracking
Claim submitted. AI tracks status, drafts responses to payer queries as they come in, and escalates anything unusual back to the specialist.
Two prompts, one A/B test.
You can't go straight to production with something like this. The validation plan I'd run before any pilot:
Prompt 1 — Data extraction and coding
Feed the AI a synthetic-but-realistic patient visit record. Have it extract the structured fields and propose codes. Compare against a gold-standard set coded by a senior specialist.
Prompt 2 — Insurer query response
Give the AI a payer's request for additional information plus the underlying medical records. Have it draft the response. Have two specialists rate the drafts on (a) factual accuracy and (b) tone-appropriateness.
The pilot
Run an A/B test against manual processing — randomised set of claims, half processed AI-assisted, half manually. Measure two things: time per claim and first-pass approval rate. If AI-assisted claims aren't both faster and equally or more likely to be approved, the test fails. No partial credit.
The metrics that actually matter.
North star
Claims processed correctly per specialist per day. This is the metric that captures both speed and quality — and ties directly to provider cash flow and patient experience.
Supporting metrics
- First-pass approval rate. Target: lift from ~60% baseline to 80%+ within 6 months of deployment.
- Error rate per 100 claims. Tracked by error type so we can see where the AI is helping vs. where it needs more training data.
- Average processing time per claim. Target: 50%+ reduction.
- Specialist satisfaction. Quarterly survey. If the AI is correct but the specialists hate it, the AI is wrong.
- Cost per processed claim. Manual baseline vs. AI-assisted — the number that justifies the investment to leadership.
What I'd give up to ship this right.
Three deliberate constraints I'd build in from day one:
- Human-in-the-loop for everything. No autonomous submission. The AI proposes; the specialist disposes. Yes, it caps the efficiency ceiling. In a regulated, high-stakes domain, that's the right ceiling.
- Auditable reasoning, not just outputs. Every AI-assigned code needs to come with a rationale — extracted text, rule applied, confidence score. If the system can't explain itself, a human can't responsibly approve it. This will slow throughput by maybe 10%. It will save you from the compliance incident that ends the product.
- Narrow domain at launch. Don't try to handle every specialty on day one. Pick one (e.g. outpatient internal medicine), nail it, then expand. Generalist AI is impressive in demos and unreliable in production.
The version of this product that wins is the one that's boring to demo and excellent to use. That's the bet I'd make.