Paleo Bench
HTR Model Leaderboard
Greek minuscule handwritten text recognition benchmarks across 17 model configurations. Models ranked by transcription accuracy, with latency indicated by color and cost by dot size.
February 24, 2026 at 09:19 PM Open comparison viewer →
Models
17
Samples
5
Total Cost
$3.4732
Model Ranking
Click a row to inspect
Dot Color = Latency
Low
Mid
High
Dot Size = Cost
Sorted by quality (1 - CER) descending
Top Performer
1
Gemini 3 Pro (low)
Quality: 94.7%
CER Mean
5.3%
WER Mean
17.8%
Cost / sample
$0.0127
Latency / sample
13.4s
Metric glossary
- CER
- Character Error Rate — fraction of characters the model got wrong vs. the ground truth. Lower is better.
- WER
- Word Error Rate — fraction of words with at least one error. Lower is better.
- Quality
- 1 minus CER — how accurate the transcription is overall. Higher is better.
- Similarity
- Normalized Levenshtein similarity — how closely the output matches the reference text, from 0 to 1. Higher is better.