We present a single-stage uncertainty-aware approach for jersey number recognition in soccer. Our method employs digit-compositional classifiers that leverage the structural relationships between digits, coupled with a Dirichlet-based uncertainty modeling framework and a tracklet aggregation that combines frame-level predictions using confidence-based filtering. Unlike previous approaches that focus primarily on detecting visible jerseys, we reframe the problem through the lens of uncertainty quantification. This perspective enables more nuanced predictions, particularly in challenging cases of partial visibility or occlusion. Our unified architecture directly processes player crops without requiring explicit jersey detection as in traditional multi-stage pipelines. Through extensive experiments on the SoccerNet and Copa America (CA, ours) datasets, we demonstrate that digit-compositional approaches consistently outperform independent classifiers, while Dirichlet-based uncertainty modeling further improves performance by providing better calibrated confidence estimates across visibility conditions. We achieve high performance on the SoccerNet Challenge benchmark with 85.62% tracklet-level accuracy.
Overview of the proposed jersey recognition architecture. A pre-trained backbone extracts features from player crops. The Jersey Number Head, specifically the Tied Digit-aware (TDA) variant shown here, processes these features using position embeddings and a shared digit classifier to produce jersey scores. The final Classification Head transforms these scores into probabilities and an uncertainty estimate using an uncertainty-aware mechanism (e.g., Dirichlet). We present an actual output from ViT-B/8 trained on 200M dataset (ours).
Tracklet-level accuracy results. Prior work results are taken from Koshkina et al. (2024).
Method | Test Acc | Challenge Acc |
---|---|---|
Gerke et al. (2015) | 32.57% | 35.79% |
Vats et al. (2021) | 46.73% | 49.88% |
Li et al. (2018) | 47.85% | 50.60% |
Vats et al. (2022) | 52.91% | 58.45% |
Balaji et al. (2023) | 68.53% | 73.77% |
Koshkina et al. (2024) | 87.45% | 79.31% |
Ours (ViT-S, SoccerNet Dataset) | 82.74% | - |
Ours (ViT-B, SoccerNet Dataset) | 86.37% | 83.52% |
Ours (ViT-B, 200M Dataset) | 85.46% | 85.62% |
Ours (ViT-B, 200M Dataset + SoccerNet finetuned) | 88.27% | 85.41% |
Tracklet-level accuracy results. Acc Vis: accuracy on tracklets with visible jersey numbers at any point in the sequence. Acc Invis: accuracy on tracklets where jersey numbers were not observed across the whole sequence. DA: Digit-Aware, TDA-M: Tied Digit-Aware with Multiplicative embedding, TDA-MB: TDA-M with Per-Digit Bias, TDA-A: TDA with Additive embedding, IND: Independent
Head Type | Acc Total (%) | Acc Vis (%) | Acc Invis (%) | |||
---|---|---|---|---|---|---|
Softmax | Dirichlet (Δ) | Softmax | Dirichlet (Δ) | Softmax | Dirichlet (Δ) | |
DA | 80.15 ± 1.16 | 83.24 ± 0.94 (+3.09) | 83.72 ± 0.85 | 85.83 ± 1.17 (+2.11) | 71.55 ± 2.46 | 77.00 ± 1.07 (+5.45) |
TDA-M | 78.70 ± 0.22 | 82.91 ± 0.25 (+4.21) | 81.85 ± 0.82 | 85.71 ± 0.70 (+3.86) | 71.08 ± 2.29 | 76.15 ± 0.86 (+5.07) |
TDA-MB | 77.81 ± 0.59 | 82.11 ± 1.92 (+4.30) | 80.49 ± 1.52 | 84.11 ± 3.65 (+3.62) | 71.36 ± 2.40 | 77.28 ± 2.28 (+5.92) |
TDA-A | 75.81 ± 0.73 | 73.03 ± 0.80 (-2.78) | 77.18 ± 1.66 | 71.11 ± 0.53 (-6.07) | 72.49 ± 1.55 | 77.65 ± 1.55 (+5.16) |
IND | 73.47 ± 1.25 | 78.48 ± 0.55 (+5.01) | 74.61 ± 0.97 | 79.05 ± 1.94 (+4.44) | 70.70 ± 1.95 | 77.09 ± 3.26 (+6.39) |
@InProceedings{Grad_2025_CVPR,
author = {Grad, {\L}ukasz},
title = {Single-Stage Uncertainty-Aware Jersey Number Recognition in Soccer},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
month = {June},
year = {2025},
pages = {6102-6110}
}