The complete human evaluation stack, now with MCP.

Connect your agent to the full Prolific evaluation stack.
Verified participants, structured results, straight into your pipeline.

Install MCP Server

Read API docs

Why Prolific

Verified humans, built into your evaluation pipeline.

No API code. No context switch. Your agent calls Prolific directly - studies launch, results return, and your pipeline keeps moving.

Your agent knows who responded

Every response carries cohort provenance — demographic filters, study conditions, and participant hash — baked into the data record. When a result comes back through MCP, you know exactly who gave you which signal.

Results land in the format your pipeline expects

Responses return as structured JSON and export as JSONL with stable cohort hashes. Idempotency on study creation makes retry safe from automated callers. No transformation step between the MCP response and your training run.

The pipeline was almost complete

The missing step was always the human one - collecting preferences, validating edge cases, catching what benchmarks miss. Now it's in the pipeline.

What customers say

"I want to remove any barrier between my agent and the results. Prolific does that."

Emerging Products Director: Fortune 500 software company

What your agent can do with one MCP install.

Install MCP Server

Launch preference studies mid-run

Your agent calls Prolific without breaking the training loop. Pairwise preference and Likert rating tasks run while your pipeline continues. Results return as JSONL, ready for RLHF and DPO — no copy-paste, no context switch.

Gate releases on human evaluation scores

Your agent recruits a participant cohort, waits for scores, and passes or fails the release — all in one pipeline step. No manual study setup. No dashboard check. Structured JSON comes back directly into the workflow.

Escalate to humans without leaving the workflow

When your agent hits ambiguity or low-confidence output, it calls Prolific directly. 68% of production agents require human intervention within 10 steps — MCP means that handoff happens inside the pipeline, not outside it. The response feeds back into the agent's context as structured JSON.

Run red-teaming as a pipeline step

Your agent recruits participants to probe for failures — bias, edge cases, judgment calls that benchmarks miss. Scriptable across model versions, against the HUMAINE evaluation framework. The kind of evaluation that used to require a manual study brief, now triggered in one tool call.

Get started

Start where your stack already lives.

MCP server

Works with Claude, Cursor, or any MCP-compatible client. Tools are discovered automatically — no SDK, no wrapper functions. Your agent decides when to call Prolific.

Install Prolific MCP

REST API

Studies, cohorts, responses, webhooks, idempotency semantics. Full programmatic control for custom integrations and training pipelines.

docs.prolific.com/api

CLI

Launch studies, wait on completion, export responses — scriptable in any pipeline. Works from shell, CI runner, or agent orchestrator.

docs.prolific.com/cli

Get started

Built for the teams at the frontier

The same participant network behind HUMAINE - a peer-reviewed evaluation benchmark - is now callable from inside your agent. Peer-reviewed methodology, accessible in one tool call.

HUMAINE — Unpacking human preference for LLMs

Technical benchmarks often lack real-world relevance. HUMAINE addresses unrepresentative sampling, superficial assessment, and single-metric reductionism in LLM evaluation.

Read the paper

Ai2 reduced data collection from weeks to hours

Allen Institute for AI built state-of-the-art multimodal models faster without sacrificing quality — using Prolific's verified human network at scale.

Gemini 3 Pro: Frontier safety framework

The frontier safety framework report for Google’s latest model.

Your training loop. Prolific's humans.

One install. Your agent calls humans. Your pipeline keeps moving.

Install the MCP server

Talk to the team

FAQ

Common questions about the MCP server

Does Prolific have an MCP server?

Yes. Install it in Claude Code, Claude Desktop, or any MCP-compatible client. Your agent can launch studies, wait on completion, and receive structured responses — without writing API code. MCP tools are discovered automatically; no SDK or wrapper required.

What MCP clients does Prolific support?

Claude Code, Claude Desktop, Cursor, and any client that implements the Model Context Protocol spec. One command to install, no SDK required. Your agent decides when to call Prolific on its own — you don't need to specify the trigger.

Can the MCP server return data my training pipeline can use?

Yes. Responses return as structured JSON with stable cohort hashes and filter provenance in every record. JSONL export is available for direct feed into RLHF, DPO, and reward-model training code. The format was designed for loops, not dashboards.

Can I use Prolific as a release gate CI?

Yes. A CI job can launch a study, wait on completion via the CLI or a webhook, aggregate responses against a threshold, and exit pass or fail. Reproducible cohort hashes make the same participant specification repeatable across model releases.

We don't use formal benchmarks. Is Prolific still relevant?

It's the norm, not an exception. Research on production agent teams shows 75% evaluate without formal benchmarks, relying on A/B tests and direct human feedback instead. Prolific doesn't replace that instinct — it gives it infrastructure. You get the same human signal you're already relying on, with verified participants, reproducible cohorts, and structured outputs your pipeline can act on.

Read the Measuring Agents in Production paper