Back to Projects
LLM in PyTorch

LLM in PyTorch

Building and training a GPT-2 style model in pure PyTorch

Nov 2024 - Dec 2024 1 month

Tech Stack

NLPPyTorchDeep LearningPre-trainingFine TuningRLHF

Description

Building an LLM (Large Language Model) has always fascinated me, and Sebastian Raschka’s book Build a Large Language Model was the perfect opportunity to revisit the Transformer architecture in PyTorch. It covers how to convert words into tokens, how to code and use the decoder part of a Transformer (in pure PyTorch, with mathematical formulas) to generate new tokens one by one, how to prepare a dataset for pre-training, how to fine-tune a model on data, and how to implement RLHF (Reinforcement Learning with Human Feedback) to make an LLM follow instructions.

Features

Challenges

Model Architecture

The implemented model follows OpenAI’s GPT-2 Small architecture:

ParameterValue
Parameters124M
Layers12
Embedding dimension768
Attention heads12
Max context1024 tokens
Vocabulary~50k tokens (BPE)

This architecture is identical to GPT-3, but at a much smaller scale (GPT-3 has 175 billion parameters compared to 124 million here), making it trainable on consumer hardware.

Results

The model was pre-trained on the Shakespeare corpus (~1 MB of text), giving it a distinctive writing style in Elizabethan English:

View All Projects