The complete human evaluation stack, now with MCP.
Verified participants, structured results, straight into your pipeline.
Verified humans, built into your evaluation pipeline.
No API code. No context switch. Your agent calls Prolific directly - studies launch, results return, and your pipeline keeps moving.
"I want to remove any barrier between my agent and the results. Prolific does that."
What your agent can do with one MCP install.
Start where your stack already lives.
Built for the teams at the frontier
The same participant network behind HUMAINE - a peer-reviewed evaluation benchmark - is now callable from inside your agent. Peer-reviewed methodology, accessible in one tool call.
Your training loop. Prolific's humans.
Common questions about the MCP server
Yes. Install it in Claude Code, Claude Desktop, or any MCP-compatible client. Your agent can launch studies, wait on completion, and receive structured responses — without writing API code. MCP tools are discovered automatically; no SDK or wrapper required.
Claude Code, Claude Desktop, Cursor, and any client that implements the Model Context Protocol spec. One command to install, no SDK required. Your agent decides when to call Prolific on its own — you don't need to specify the trigger.
Yes. Responses return as structured JSON with stable cohort hashes and filter provenance in every record. JSONL export is available for direct feed into RLHF, DPO, and reward-model training code. The format was designed for loops, not dashboards.
Yes. A CI job can launch a study, wait on completion via the CLI or a webhook, aggregate responses against a threshold, and exit pass or fail. Reproducible cohort hashes make the same participant specification repeatable across model releases.
It's the norm, not an exception. Research on production agent teams shows 75% evaluate without formal benchmarks, relying on A/B tests and direct human feedback instead. Prolific doesn't replace that instinct — it gives it infrastructure. You get the same human signal you're already relying on, with verified participants, reproducible cohorts, and structured outputs your pipeline can act on.
Read the Measuring Agents in Production paper




