Paleo Bench

HTR Model Leaderboard

Greek minuscule handwritten text recognition benchmarks across 17 model configurations. Models ranked by transcription accuracy, with latency indicated by color and cost by dot size.

February 24, 2026 at 09:19 PM Open comparison viewer

Models

17

Samples

5

Total Cost

$3.4732

Model Ranking

Click a row to inspect

Dot Color = Latency

Low
Mid
High

Dot Size = Cost

Sorted by quality (1 - CER) descending

Top Performer

1

Gemini 3 Pro (low)

Quality: 94.7%

95%QUALITY
95%SIMILARITY

CER Mean

5.3%

WER Mean

17.8%

Cost / sample

$0.0127

Latency / sample

13.4s

Metric glossary

CER
Character Error Rate — fraction of characters the model got wrong vs. the ground truth. Lower is better.
WER
Word Error Rate — fraction of words with at least one error. Lower is better.
Quality
1 minus CER — how accurate the transcription is overall. Higher is better.
Similarity
Normalized Levenshtein similarity — how closely the output matches the reference text, from 0 to 1. Higher is better.