Paleo Bench

HTR Model Leaderboard

Greek minuscule handwritten text recognition benchmarks across 17 model configurations. Models ranked by transcription accuracy, with latency indicated by color and cost by dot size.

February 24, 2026 at 09:19 PM Open comparison viewer →

Models

Samples

Total Cost

$3.4732

Model Ranking

Click a row to inspect

Dot Color = Latency

Low

Mid

High

Dot Size = Cost

Sorted by quality (1 - CER) descending

Top Performer

Gemini 3 Pro (low)

Quality: 94.7%

CER Mean

5.3%

WER Mean

17.8%

Cost / sample

$0.0127

Latency / sample

13.4s

Metric glossary

CER: Character Error Rate — fraction of characters the model got wrong vs. the ground truth. Lower is better.
WER: Word Error Rate — fraction of words with at least one error. Lower is better.
Quality: 1 minus CER — how accurate the transcription is overall. Higher is better.
Similarity: Normalized Levenshtein similarity — how closely the output matches the reference text, from 0 to 1. Higher is better.