Skip to content

Tass0sm/espnet

 
 

Repository files navigation

Our fork of espnet for CSE 5539 experiments.

We explored using ideas from image processing on the spectogram.

  1. Using axial attention blocks
  2. Using SWIN transformer blocks

ASR

Baseline

espnet_model with transformer encoder / decoder

Variant 1

Processing 2d frames of the spectogram without prior convolutions. axial attention on frames.

Result: less effective

Variant 2

About

End-to-End Speech Processing Toolkit

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 55.3%
  • Shell 42.3%
  • Perl 1.5%
  • MATLAB 0.5%
  • CMake 0.1%
  • M 0.1%
  • Other 0.2%