Adapting transducer greedy decoding#2975
Conversation
Adding a decoding loop on each decoding timestep. It is more adapted to RNN-T loss and it improves significantly the results (especially for streaming recipes)
Set buffer_chunk_size to -1 to prevent dropping first chunks. This is the case especially when a small chunk size is used (e.g., 160ms)
|
Hi @younessdkhissi, Thanks for the PR. Could you please develop a bit more what are your suggested changes, and why do you think they are necessary? Furthermore, can you report the general improvements you've got in a streaming setting? Ideally, we would like to imrpove our transfucer decoding interfaces by being more aligned to the literature, so if you could provide some references it would be great as well. Thanks Youness! |
|
Thanks @Adel-Moumen for taking time to look at my PR. I have tried to implement the following paper "DUAL-MODE ASR: UNIFY AND IMPROVE STREAMINGASR WITH FULL-CONTEXT MODELING" https://openreview.net/pdf?id=Pz_dcqfcKW8 (that could be a future PR) and these are the results I get on LibriSpeech using the streaming mode:
|
|
Hi @younessdkhissi thanks for the work. I am really not sure what this PR does. I see that there is a new arbitrary for loop for each time steps. Do you have a paper describing what is being done here formally? This would be important to attach to such a change to greedy decoding. |
|
Hello @TParcollet |
|
Hi @younessdkhissi and thank you. I'll take a closer look into this asap. In the meantime, could you please provide a few measurements of how the decoding speed is impacted by this change? Many thanks! |
|
Hi @TParcollet
If you want more measurements let me know :) |
|
@younessdkhissi thanks. Can you fix the tests and then we'll merge. |
Added spaces for improved readability in the transducer.py file.
There was a problem hiding this comment.
@younessdkhissi one last thing, could you make the max_steps an aargument of the function isntead?
|
@younessdkhissi sorry I should have been more precise. I think "max_steps" is a bit to generic and may be interpreted as "max number of decoding steps" which may be confusing. Thanks for adding it to the arguments, but could you give it a more adequate name? Thanks! |
|
@TParcollet It's my fault for making such a generic name for this variable. I propose "max_symbols_per_step" to avoid any confusion. Let me know if there are more changes to do :) |
|
Thanks! |

Adding a decoding loop on each decoding timestep. It is more adapted to RNN-T loss and it improves significantly the results (especially for streaming recipes)
What does this PR do?
Fixes the transducer greedy decoding (this probably fixes the issue #2753)
Before submitting
PR review
Reviewer checklist