Add missing mean_stat_per_model method to StatObject_SB#3029
Add missing mean_stat_per_model method to StatObject_SB#3029pplantinga merged 2 commits intospeechbrain:developfrom
Conversation
The PLDA scoring functions fast_PLDA_scoring and fast_PLDA_scoring_with_uncertainty call enroll_ctr.mean_stat_per_model(), but only sum_stat_per_model was defined. This raises an AttributeError when enrollment models are not unique. Add mean_stat_per_model that computes the per-model average of zero- and first-order statistics using the existing sum_stat_per_model.
pplantinga
left a comment
There was a problem hiding this comment.
There must not be any recipes that have multiple models per stat because this code clearly never was able to handle that. For example, later in the file (where mean_stat_per_model gets called) the same code is repeated, which also looks like a mistake:
# If models are not unique, compute the mean per model, display a warning
if not numpy.unique(enroll_ctr.modelset).shape == enroll_ctr.modelset.shape:
# logging.warning("Enrollment models are not unique, average i-vectors")
enroll_ctr = enroll_ctr.mean_stat_per_model()
Do we have a way of testing this? Cuz if not maybe it would just be better to add a note that the code doesn't work for >1 model per stat.
|
Yeah you're right, I dug through the recipes and couldn't find anything that actually hits the multi-model path. The code's been broken for a while and nobody noticed, so probably nobody uses it. That said, since Want me to switch it to a |
|
Given that there's no easy way to test this, here's the solution that seems best to me: Keep the |
|
That makes sense — keeps the method available but makes the multi-model case explicit rather than silently averaging. I'll update the PR with that approach. |
Per reviewer feedback, fast_PLDA_scoring() now raises a ValueError when enrollment models are not unique, directing users to call mean_stat_per_model() explicitly. Removed the redundant second uniqueness check after centering since the first check already guards.
Summary
mean_stat_per_modelmethod toStatObject_SBinspeechbrain/processing/PLDA_LDA.pyfast_PLDA_scoringandfast_PLDA_scoring_with_uncertaintycallenroll_ctr.mean_stat_per_model()when enrollment models are not unique, but onlysum_stat_per_modelwas definedAttributeErrorwhen duplicate model IDs are present in enrollment dataThe new method reuses
sum_stat_per_modelinternally and divides by the session count to compute per-model averages.Fixes #3026
Test plan