Conversation
# This is just to avoid an edge case in the case if candidates in log_probs is less than int(beam_size * self.scorer_beam_scale)
'''
Error I faced.
_, candidates = log_probs.topk(
RuntimeError: selected index k out of range
'''
pplantinga
left a comment
There was a problem hiding this comment.
Looks like a good change, once comments are addressed
There was a problem hiding this comment.
Pull request overview
This PR updates ScorerBuilder.score() to avoid a RuntimeError: selected index k out of range when selecting top-k candidates for partial scorers by ensuring topk() is not called with k > vocab_size.
Changes:
- Introduce a computed
k(currentlysbc) that capstopk’sktolog_probs.shape[-1]when the vocabulary is smaller thanint(beam_size * scorer_beam_scale). - Replace the inline
topk(int(beam_size * scorer_beam_scale))call withtopk(sbc).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
speechbrain/decoders/scorer.py
Outdated
| # select candidates from the results of full scorers for partial scorers | ||
| _, candidates = log_probs.topk( | ||
| int(beam_size * self.scorer_beam_scale), dim=-1 | ||
| ) | ||
| _, candidates = log_probs.topk(sbc, dim=-1) |
There was a problem hiding this comment.
This change addresses a runtime edge case (topk with k > vocab), but there doesn’t appear to be a unit test covering ScorerBuilder.score() candidate selection. Adding a small test that constructs a ScorerBuilder and verifies score() works when vocab_size < int(beam_size * scorer_beam_scale) (and also when the computed k would be 0) would help prevent regressions.
This PR does not do anything Just avoids an edge case.
This is just to avoid an edge case in the case if candidates in log_probs is less than int(beam_size * self.scorer_beam_scale)
Error I faced.