README.md

Building a Basic PyTorch Transformer

This repository contains a Jupyter Notebook that breaks down the implementation of a Transformer model using PyTorch. It provides a "from-scratch" approach to understanding how modern NLP models process sequential data.

Project Overview

The project follows the architecture introduced in the landmark paper "Attention is All You Need." It focuses on the modular implementation of the encoder and decoder components.

Key Features:

Input Embeddings: Converting token IDs into dense vectors scaled by the square root of the model dimension.
Positional Encoding: Using sine and cosine functions to inject sequence order information into embeddings.
Multi-Head Attention: A custom class implementation that handles linear projections for Queries, Keys, and Values, head splitting, and attention weight computation.
Model Inspection: Visualizing the full nn.Transformer object structure, including encoder/decoder layers, normalization, and dropout.

Installation

Clone the repository:

git clone https://github.com/Joe-Naz01/transformers.git
cd transformers

conda create -n transformers
conda activate transformers
pip install requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building a Basic PyTorch Transformer

Project Overview

Key Features:

Installation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Building a Basic PyTorch Transformer

Project Overview

Key Features:

Installation