This repo has PyTorch implementations of the following:
- the encoder-decoder model from the Attention is all you need paper
- a decoder-only model
No models have KV caching.
All models were trained on the following tasks:
- string reversal (max input character limit: 8)
- addition of two operands (max input character limit per operand: 5)
In reality, there are "four" tasks, since I separated the tasks based on model type:
string_reverse_encoder_decoderstring_reverse_decoder_onlyaddition_encoder_decoderaddition_decoder_only
You can test out the models on the tasks above by running python evaluation.py. Edit the main block to try out different tasks.
evaluation.py: test the PyTorch models I pre-trained on the taskstrain.py: train your own checkpoints of the PyTorch models. To test out your saved checkpoints, copy the filename of the checkpoint (excluding the file extension) and paste it into the main block inevaluation.pyand runpython evaluation.py.data.py: generate new data for the tasks (WARNING: this will overwrite the current data, unless you move the existing data elsewhere)config.py: configs for training. There's a separate config for each of the four tasks. The default config values guarantee convergence for the default generated data (string_reverse_encoder_decoderwill converge by step/epoch 4000,string_reverse_decoder_onlywill converge by step/epoch 4000,addition_encoder_decoderwill converge by step/epoch 22000, andaddition_decoder_onlywill converge by step/epoch 193000). (WARNING: Theaddition_decoder_onlytask for the default config values will converge to a validation loss value of0.03605at step/epoch 172000, but the script won't stop until step/epoch 193000 because the patience will run out and it never reached a validation loss of0.03. You can adjust themax_patiencehyper parameter to control this. All the other tasks should converge to a validation loss value of0.03at a much shorter step/epoch with the default config values).layers.py: contains thenn.Module/ layers of the transformerinference.py: contains teacher-forcing and auto-regressive decoding functions that call the layer functions fromlayers.pytokenizer.py: contains the tokenizer class for each taskcodebase_string.py: script that copies the entire codebase as a string so I can copy and paste it to Gemini for feedback/debuggingdata/: directory where the generated data for each task is written tocheckpoints/: directory where the model checkpoints for each task are savedbasic_torch_examples/: unrelated directory where I was practicing basic PyTorch scripts for linear regression / basic trainingrequirements.txt: install by runningpip install -r requirements.txt. (NOTE: there may be some unnecessary dependencies here, but I just copied my virtual environment's dependencies over by runningpip freeze). I'm using Python version3.11.9.