You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to reproduce the ASR Transformer recipe for LibriSpeech (https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transformer)) to validate the conclusions of the paper HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition.
Per Table 2 of the paper, HyperConformer outperforms Conformer on the test-other split of 100-hour LibriSpeech (in terms of WER) and demonstrates better data efficiency. However, my training results show:
Overall WER values are higher than expected;
HyperConformer does not outperform Conformer (contrary to the paper’s conclusion);
Conformer_22M yields abnormally high WER (over 95%) across training, test-clean, and test-other splits.
GPU: RTX 3090 (24GB)
Execution Commands:
python train.py hparams/hyperconformer_8M.yaml
python train.py hparams/conformer_8M.yaml # Modified from hyperconformer_8M.yaml
Also ran experiments for hyperconformer_22M.yaml and conformer_22M.yaml
Key Config Modifications:
For conformer_8M.yaml, I only modified attention_type to RelPosMHAXL (from HyperConformer’s config). To use only the 100-hour LibriSpeech subset, I added/modified these lines in the YAML files:
Dataset path was specified; added splits for 100h LibriSpeech
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I'm attempting to reproduce the ASR Transformer recipe for LibriSpeech (https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transformer)) to validate the conclusions of the paper HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition.
Per Table 2 of the paper, HyperConformer outperforms Conformer on the test-other split of 100-hour LibriSpeech (in terms of WER) and demonstrates better data efficiency. However, my training results show:
Overall WER values are higher than expected;
HyperConformer does not outperform Conformer (contrary to the paper’s conclusion);
Conformer_22M yields abnormally high WER (over 95%) across training, test-clean, and test-other splits.
GPU: RTX 3090 (24GB)
Execution Commands:
python train.py hparams/hyperconformer_8M.yaml
python train.py hparams/conformer_8M.yaml # Modified from hyperconformer_8M.yaml
Also ran experiments for hyperconformer_22M.yaml and conformer_22M.yaml
Key Config Modifications:
For conformer_8M.yaml, I only modified attention_type to RelPosMHAXL (from HyperConformer’s config). To use only the 100-hour LibriSpeech subset, I added/modified these lines in the YAML files:
Dataset path was specified; added splits for 100h LibriSpeech
train_splits: ["train-clean-100"]
dev_splits: ["dev-clean"]
test_splits: ["test-clean", "test-other"]
skip_prep: False
train_csv: !ref <output_folder>/train.csv
valid_csv: !ref <output_folder>/dev-clean.csv
test_csv:
- !ref <output_folder>/test-clean.csv
- !ref <output_folder>/test-other.csv
Training Log Snippets
epoch: 106, lr: 6.05e-04, steps: 68370, optimizer: Adam - train loss: 24.91 - valid loss: 32.76, valid ACC: 8.75e-01
epoch: 107, lr: 6.02e-04, steps: 69015, optimizer: Adam - train loss: 24.79 - valid loss: 31.85, valid ACC: 8.76e-01
epoch: 108, lr: 5.99e-04, steps: 69660, optimizer: Adam - train loss: 24.76 - valid loss: 34.30, valid ACC: 8.75e-01
epoch: 109, lr: 5.96e-04, steps: 70305, optimizer: Adam - train loss: 24.53 - valid loss: 32.79, valid ACC: 8.75e-01
epoch: 110, lr: 5.94e-04, steps: 70950, optimizer: Adam - train loss: 24.51 - valid loss: 33.23, valid ACC: 8.75e-01, valid WER: 11.38
Epoch loaded: 110 - test loss: 17.27, test ACC: 8.88e-01, test WER: 6.20 // test-clean
Epoch loaded: 110 - test loss: 10.42, test ACC: 7.96e-01, test WER: 15.57 // test-other
HyperConformer_8M (100h LibriSpeech)
epoch: 105, lr: 6.08e-04, steps: 67725, optimizer: Adam - train loss: 28.67 - valid loss: 30.93, valid ACC: 8.78e-01
epoch: 106, lr: 6.05e-04, steps: 68370, optimizer: Adam - train loss: 28.52 - valid loss: 31.04, valid ACC: 8.78e-01
epoch: 107, lr: 6.02e-04, steps: 69015, optimizer: Adam - train loss: 28.54 - valid loss: 31.11, valid ACC: 8.77e-01
epoch: 108, lr: 5.99e-04, steps: 69660, optimizer: Adam - train loss: 28.40 - valid loss: 30.46, valid ACC: 8.77e-01
epoch: 109, lr: 5.96e-04, steps: 70305, optimizer: Adam - train loss: 28.34 - valid loss: 30.46, valid ACC: 8.79e-01
epoch: 110, lr: 5.94e-04, steps: 70950, optimizer: Adam - train loss: 28.24 - valid loss: 30.49, valid ACC: 8.79e-01, valid WER: 11.79
Epoch loaded: 110 - test loss: 17.17, test ACC: 8.90e-01, test WER: 6.29 // test-clean
Epoch loaded: 110 - test loss: 10.44, test ACC: 7.94e-01, test WER: 16.47 // test-other
HyperConformer_22M (100h LibriSpeech)
epoch: 106, lr: 6.05e-04, steps: 68370, optimizer: Adam - train loss: 18.79 - valid loss: 34.62, valid ACC: 8.65e-01
epoch: 107, lr: 6.02e-04, steps: 69015, optimizer: Adam - train loss: 18.84 - valid loss: 33.89, valid ACC: 8.66e-01
epoch: 108, lr: 5.99e-04, steps: 69660, optimizer: Adam - train loss: 18.60 - valid loss: 34.30, valid ACC: 8.64e-01
epoch: 109, lr: 5.96e-04, steps: 70305, optimizer: Adam - train loss: 18.56 - valid loss: 34.02, valid ACC: 8.67e-01
epoch: 110, lr: 5.94e-04, steps: 70950, optimizer: Adam - train loss: 18.52 - valid loss: 33.75, valid ACC: 8.66e-01, valid WER: 11.73
Epoch loaded: 110 - test loss: 18.75, test ACC: 8.84e-01, test WER: 6.65 // test-clean
Epoch loaded: 110 - test loss: 11.50, test ACC: 7.91e-01, test WER: 16.75 // test-other
Conformer_22M
WER values are abnormally high (over 95%) across training, test-clean, and test-other splits (no detailed logs provided for this run).
Questions
1.Have there been any recent changes to the code or configuration files for these recipes?
2.Could you share the exact running steps and full configuration files required to reproduce the paper’s results (100h LibriSpeech, HyperConformer vs Conformer)?
Beta Was this translation helpful? Give feedback.
All reactions