Paleo Bench
HTR Model Leaderboard
Greek minuscule handwritten text recognition benchmarks across 17 model configurations. Models ranked by transcription accuracy, cost, and latency.
February 25, 2026 at 11:05 PM Open comparison viewer →
Models
17
Samples
7
Total Cost
$5.00
Model Ranking
Click a row to inspect
2
Gemini 3 Flash (low)
3
Gemini 3 Pro (low)
4
Gemini 3 Flash (high)
5
Gemini 3.1 Pro (high)
6
Gemini 3.1 Pro (low)
7
Opus 4.6 (thinking)
8
Opus 4.6
9
Sonnet 4.6
10
Sonnet 4.6 (thinking)
11
GPT-5.2 (low)
12
GPT-5.2 (high)
13
GPT-4.1
14
GPT-5 Mini (low)
15
Haiku 4.5
16
Haiku 4.5 (thinking)
17
GPT-5 Mini (high)
Bar = Quality
Sorted by quality (1 − CER) descending
Top Performer
1
Gemini 3 Pro (high)
Quality: 94.8%
94.8%
$0.17
2m 15s
CER Mean
5.2%
WER Mean
17.0%
Cost / sample
$0.17
Latency / sample
2m 15s
Metric glossary
- CER
- Character Error Rate — fraction of characters the model got wrong vs. the ground truth. Lower is better.
- WER
- Word Error Rate — fraction of words with at least one error. Lower is better.
- Quality
- 1 minus CER — how accurate the transcription is overall. Higher is better.