# Benchmarks - Grep

[Home](/)Benchmark

# #1 on Every Major Benchmark

Evaluated on DRACO, DeepSearchQA, and DeepResearch Bench — PhD-level research tasks graded by domain experts.

78.6%

DRACO

84.5%

DeepSearchQA

56.27

DeepResearch Bench

[Join the Waitlist](/waitlist)[View on GitHub](https://github.com/Parcha-ai/benchmarks)

DRACO — Perplexity + Harvard

## DRACO Benchmark

100 open-ended research questions across 10 domains, judged by Gemini-2.5-Pro against 3,934 weighted rubric criteria. Grep leads all four evaluation axes and wins 9 of 10 domains.

Grep wins 9 of 10 domains

Grep

0%

Perplexity DR (Opus 4.6)

0%

Claude Opus 4.6

0%

Gemini Deep Research

0%

OpenAI Deep Research (o3)

0%

### Factual Accuracy

75.4%

+7.5pp vs Perplexity

### Breadth & Depth

80.3%

+7.2pp

### Presentation

93.3%

+3.0pp

### Citation

79.1%

+14.5pp

DeepSearchQA — Google

Grep

0%

Perplexity Deep Research

0%

Moonshot K2.5

0%

Anthropic Opus 4.5

0%

Parallel Ultra2x

0%

## DeepSearchQA

896 multi-step research questions across 17 subject domains. Judge: Gemini 2.5 Flash. Grep achieves 84.5% FC with perfect scores in Linguistics, Biology, and Arts & Entertainment.

14 of 17 categories exceed 80% FC

DeepResearch Bench — RACE Framework

## DeepResearch Bench

100 PhD-level research questions (50 Chinese, 50 English), judged by Gemini-2.5-Pro. A score above 50 means the system outperformed the human expert. Grep leads the field of 34 systems.

Grep

0.00

Cellcog Max

0.00

nvidia-aiq

0.00

Cellcog

0.00

CMCC-DeepInsight

0.00

### Insight

58.98

### Comprehensiveness

56.79

### Instruction Following

53.49

### Readability

53.50

## Methodology

### Multi-Agent Architecture

Grep orchestrates specialised sub-agents — each responsible for search, synthesis, verification, and citation — then merges their outputs into a single, coherent research report.

### Claude Opus 4.6 Backbone

All reasoning and synthesis steps are powered by Claude Opus 4.6, giving Grep best-in-class analytical depth, nuanced judgement, and instruction following.

[Full data and reproduction scripts on GitHub](https://github.com/Parcha-ai/benchmarks)

## Experience #1 Ranked Research

See why Grep outperforms OpenAI, Google, Perplexity, and every specialised research platform on PhD-level tasks.

[Join the Waitlist](/waitlist)[API Docs](/developers)