Single-Stage Uncertainty-Aware Jersey Number Recognition in Soccer

Abstract

We present a single-stage uncertainty-aware approach for jersey number recognition in soccer. Our method employs digit-compositional classifiers that leverage the structural relationships between digits, coupled with a Dirichlet-based uncertainty modeling framework and a tracklet aggregation that combines frame-level predictions using confidence-based filtering. Unlike previous approaches that focus primarily on detecting visible jerseys, we reframe the problem through the lens of uncertainty quantification. This perspective enables more nuanced predictions, particularly in challenging cases of partial visibility or occlusion. Our unified architecture directly processes player crops without requiring explicit jersey detection as in traditional multi-stage pipelines. Through extensive experiments on the SoccerNet and Copa America (CA, ours) datasets, we demonstrate that digit-compositional approaches consistently outperform independent classifiers, while Dirichlet-based uncertainty modeling further improves performance by providing better calibrated confidence estimates across visibility conditions. We achieve high performance on the SoccerNet Challenge benchmark with 85.62% tracklet-level accuracy.

Method Overview

Jersey recognition architecture overview

Overview of the proposed jersey recognition architecture. A pre-trained backbone extracts features from player crops. The Jersey Number Head, specifically the Tied Digit-aware (TDA) variant shown here, processes these features using position embeddings and a shared digit classifier to produce jersey scores. The final Classification Head transforms these scores into probabilities and an uncertainty estimate using an uncertainty-aware mechanism (e.g., Dirichlet). We present an actual output from ViT-B/8 trained on 200M dataset (ours).

Results

Comparison with Prior Work on SoccerNet Benchmarks

Tracklet-level accuracy results. Prior work results are taken from Koshkina et al. (2024).

Method	Test Acc	Challenge Acc
Gerke et al. (2015)	32.57%	35.79%
Vats et al. (2021)	46.73%	49.88%
Li et al. (2018)	47.85%	50.60%
Vats et al. (2022)	52.91%	58.45%
Balaji et al. (2023)	68.53%	73.77%
Koshkina et al. (2024)	87.45%	79.31%
Ours (ViT-S, SoccerNet Dataset)	82.74%	-
Ours (ViT-B, SoccerNet Dataset)	86.37%	83.52%
Ours (ViT-B, 200M Dataset)	85.46%	85.62%
Ours (ViT-B, 200M Dataset + SoccerNet finetuned)	88.27%	85.41%

Ablation study based on SoccerNet Test Dataset

Tracklet-level accuracy results. Acc Vis: accuracy on tracklets with visible jersey numbers at any point in the sequence. Acc Invis: accuracy on tracklets where jersey numbers were not observed across the whole sequence. DA: Digit-Aware, TDA-M: Tied Digit-Aware with Multiplicative embedding, TDA-MB: TDA-M with Per-Digit Bias, TDA-A: TDA with Additive embedding, IND: Independent

Head Type	Acc Total (%)		Acc Vis (%)		Acc Invis (%)
Head Type	Softmax	Dirichlet (Δ)	Softmax	Dirichlet (Δ)	Softmax	Dirichlet (Δ)
DA	80.15 ± 1.16	83.24 ± 0.94 (+3.09)	83.72 ± 0.85	85.83 ± 1.17 (+2.11)	71.55 ± 2.46	77.00 ± 1.07 (+5.45)
TDA-M	78.70 ± 0.22	82.91 ± 0.25 (+4.21)	81.85 ± 0.82	85.71 ± 0.70 (+3.86)	71.08 ± 2.29	76.15 ± 0.86 (+5.07)
TDA-MB	77.81 ± 0.59	82.11 ± 1.92 (+4.30)	80.49 ± 1.52	84.11 ± 3.65 (+3.62)	71.36 ± 2.40	77.28 ± 2.28 (+5.92)
TDA-A	75.81 ± 0.73	73.03 ± 0.80 (-2.78)	77.18 ± 1.66	71.11 ± 0.53 (-6.07)	72.49 ± 1.55	77.65 ± 1.55 (+5.16)
IND	73.47 ± 1.25	78.48 ± 0.55 (+5.01)	74.61 ± 0.97	79.05 ± 1.94 (+4.44)	70.70 ± 1.95	77.09 ± 3.26 (+6.39)

Qualitative Results

Extended qualitative results on the Copa America (CA, ours) dataset using our ViT-B/8 model trained on 200M dataset (ours). Examples are sorted vertically by increasing prediction uncertainty (low to high). Each panel shows a player crop with ground truth (GT) and predicted (Pred) detection-level jersey numbers. The border color indicates prediction uncertainty, mapped by the color bar (blue=low, red=high). This demonstrates a clear correlation: low uncertainty (blue borders, top rows) typically corresponds to clearer images and correct predictions, while high uncertainty (red borders, bottom rows) is associated with challenging conditions (blur, low resolution, occlusion) and a higher likelihood of prediction errors.

Extended qualitative results on the SoccerNet Test dataset using our ViT-B/8 model trained on 200M dataset (ours). Examples are sorted vertically by increasing prediction uncertainty (low to high). Each panel shows a player crop with ground truth (GT) and predicted (Pred) detection-level jersey numbers. The border color indicates prediction uncertainty, mapped by the color bar (blue=low, red=high). This demonstrates a clear correlation: low uncertainty (blue borders, top rows) typically corresponds to clearer images and correct predictions, while high uncertainty (red borders, bottom rows) is associated with challenging conditions (blur, low resolution, occlusion) and a higher likelihood of prediction errors.

BibTeX

@InProceedings{Grad_2025_CVPR, author = {Grad, {\L}ukasz}, title = {Single-Stage Uncertainty-Aware Jersey Number Recognition in Soccer}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {6102-6110} }