GPT

GPT-2 Implementation

This is an implementation of GPT-2. The implementation is heavily inspired by the GPT-2 implementation by Andrej Karpathy and Chapter 5 of the book "Build a Large Language Model" by Sebastian Raschka.

In this implementation, I did not apply the original weight initialization from the GPT-2 paper, as it led to worse results during training. I also chose not to apply weight tying, which resulted in a model with approximately 164M parameters instead of the original 124M.

To-do:

Add a validation set.
Train the model on a larger dataset.
Include more information on how the model processes the data (Mechanistic Interpretability).

Name		Name	Last commit message	Last commit date
parent directory ..
__pycache__		__pycache__
data		data
results		results
gpt2.py		gpt2.py
plot.py		plot.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

GPT-2 Implementation

To-do:

FilesExpand file tree

GPT

Directory actions

More options

Directory actions

More options

Latest commit

History

GPT

Folders and files

parent directory

readme.md

GPT-2 Implementation

To-do: