Now accepting verified clinicians

Find what AI gets wrong. Get paid.

A bounty program for medical students, residents, and attendings. Submit structured corrections when frontier AI models fail on clinical reasoning. Earn between tasks. Opt in to receive tailored outreach for larger studies from frontier labs.

Open Dashboard → See How It Works

$5–$150

Per accepted bounty

3 Tiers

By clinical complexity

24–48hr

Review turnaround

How It Works
The bounty loop

Traditional labeling platforms have idle periods between projects. Bounties keep you earning by hunting for model failures on your own schedule.

01

Get Verified

Confirm your enrollment in an accredited medical school or residency program. Verification takes under 48 hours. Your credentials determine which bounty tiers you can access.

02

Hunt Failures

Ask frontier AI models clinical questions. When you find a wrong, incomplete, or misleading response, capture it. Tier 1 quick asks, Tier 2 clinical vignettes, Tier 3 management decisions.

03

Submit a Trace

File a structured reasoning trace: the prompt, the model's output, your failure classification, the correct answer with stepwise clinical logic, and a severity score. Peer-reviewed within 48 hours.

Bounty Tiers
Three levels of clinical complexity

Each tier has a distinct reasoning trace structure calibrated to the depth of clinical judgment required.

Tier 1

Quick Clinical Interpretations

Single-concept questions patients ask about labs, vitals, or symptoms — where models give wrong or dangerously incomplete answers.

$8–$35 / bounty

"My mom's sodium came back at 119. The doctor said come back in a week — is that okay?" → Model reassures the patient. Reality: Na 119 is severe hyponatremia — seizure and death risk. This is a medical emergency.

  • Wrong model output captured verbatim
  • Correct interpretation with clinical justification
  • Source cited (UpToDate, guidelines, first principles)
  • Severity scored: nuisance vs. life-threatening
Tier 2

Clinical Vignettes with Data

Full patient scenarios where models fail on multi-step reasoning, differential narrowing, and data integration across labs, imaging, and history.

$30–$150 / bounty

45F, fatigue, weight loss, Na 128, K 5.8, glucose 62, BP 88/54 → Model anchors on sepsis, ranks adrenal insufficiency "less likely." Reality: The Na/K/glucose triad IS adrenal crisis. Delay for workup without steroids could be fatal.

  • Annotated failure points (where reasoning broke)
  • Failure mode tagged: anchoring, premature closure, data integration
  • Correct differential and workup, stepwise
  • Severity and harm potential scored
Tier 3

Management and Disposition

High-stakes triage and treatment decisions where the model's error directly maps to patient harm. Documented as the hardest failure mode for frontier models.

$75–$300 / bounty

52M, type 2 diabetic, vomiting 2 days, can't keep fluids down, breathing fast → Model says "try small sips, see your doctor Monday." Reality: This is DKA. Mortality 2–5%. Waiting until Monday could mean coma or death. Based on failures documented in Nature Medicine, Feb 2026.

  • Decision criteria and risk stratification made explicit
  • "What could go wrong" counterfactual analysis
  • Full severity and harm scoring with timeline
  • Sources and guideline citations

Payouts scale with training level: medical students earn the base rate, residents earn 1.5×, senior residents and fellows earn 1.75×, and attendings earn up to 2× per bounty.

Reasoning Trace
A structured correction, not just an opinion

Every bounty submission follows a universal skeleton. Higher tiers add annotated failure points and counterfactual analysis.

Example: Tier 1 Bounty Submission
Clinically Incomplete
Prompt Used
"My potassium came back at 6.2 mEq/L. What does this mean?"
Model Output
"A potassium of 6.2 is above normal range (3.5–5.0). This is called hyperkalemia. You should follow up with your doctor to discuss dietary changes and possible medication adjustments."
Failure Type
Clinically Incomplete  Right direction, but missing critical context that changes management urgency.
Correct Answer
K+ of 6.2 is severe hyperkalemia and a medical emergency. Risk of fatal cardiac arrhythmia (peaked T-waves, widened QRS, sine wave). Requires urgent evaluation: stat ECG, repeat BMP to rule out hemolysis artifact, and if confirmed, immediate treatment (IV calcium gluconate for cardiac stabilization, insulin + glucose, kayexalate, and nephrology consult if renal failure). "Follow up with your doctor" dangerously underestimates the acuity.
Source
UpToDate: "Treatment and prevention of hyperkalemia in adults." AHA/ACC Guidelines for emergency management of electrolyte disturbances.
Severity Score
High. Delayed treatment of K+ >6.0 carries significant mortality risk from cardiac arrest. Model response could cause a patient to defer emergent care.
Wrongness Taxonomy
Not all errors are equal

Every bounty submission classifies the model failure. This taxonomy is itself a signal that frontier labs pay for.

Factually Incorrect

The model gives a clearly wrong answer. Wrong diagnosis, wrong drug, wrong mechanism. The simplest failure mode, but the most dangerous when delivered with confidence.

Clinically Incomplete

Right direction, but missing context that changes management. The potassium example above: technically accurate that 6.2 is "high," but omitting the emergency framing could cost a life.

Subtly Misleading

Technically accurate, but framed in a way that leads to wrong action. Correct information with incorrect emphasis, false reassurance, or missing urgency calibration.

More Than Bounties
A verified clinician network, not just a gig board

Bounties keep you earning between projects. But the real value is the network you join.

📋

Opt In to Lab Studies

Frontier labs and intermediaries (Mercor, Turing, Surge AI) periodically launch larger annotation and evaluation studies. Verified MedBounty members get first access and tailored outreach matched to your specialty and training level.

🏥

All Training Levels Welcome

MS3s and MS4s bring fresh clinical reasoning. Residents bring procedural and management depth. Attendings bring decades of pattern recognition. Different tiers benefit from different expertise, and your profile reflects your level.

🔄

Earn on Your Schedule

No waiting for the next project to drop. Bounties are async and self-paced — hunt for model failures whenever you have 15 minutes or an hour. Consistent earning between contracted annotation studies.

Get Started
Join the verified clinician network

Open to medical students (MS3+), residents, and attendings at accredited U.S. programs. Verification takes under 48 hours. Earn bounties on your own schedule, and opt in to receive tailored outreach when frontier labs launch larger annotation studies.

Open Dashboard →