Skip to content

Latest commit

 

History

History
23 lines (17 loc) · 1.22 KB

File metadata and controls

23 lines (17 loc) · 1.22 KB

Building a Basic PyTorch Transformer

This repository contains a Jupyter Notebook that breaks down the implementation of a Transformer model using PyTorch. It provides a "from-scratch" approach to understanding how modern NLP models process sequential data.

Project Overview

The project follows the architecture introduced in the landmark paper "Attention is All You Need." It focuses on the modular implementation of the encoder and decoder components.

Key Features:

  • Input Embeddings: Converting token IDs into dense vectors scaled by the square root of the model dimension.
  • Positional Encoding: Using sine and cosine functions to inject sequence order information into embeddings.
  • Multi-Head Attention: A custom class implementation that handles linear projections for Queries, Keys, and Values, head splitting, and attention weight computation.
  • Model Inspection: Visualizing the full nn.Transformer object structure, including encoder/decoder layers, normalization, and dropout.

Installation

  1. Clone the repository:
    git clone https://github.com/Joe-Naz01/transformers.git
    cd transformers
    
    conda create -n transformers
    conda activate transformers
    pip install requirements.txt