Links to the book:
- https://amzn.to/4fqvn0D (Amazon)
- https://mng.bz/M96o (Manning)
Link to the GitHub repository: https://github.com/rasbt/LLMs-from-scratch
This is a supplementary video explaining how to code an LLM architecture from scratch.
00:00 4.1 Coding an LLM architecture
13:52 4.2 Normalizing activations withlayer normalization
36:02 4.3 Implementing a feed forward network with GELU activations
52:16 4.4 Adding shortcut connections
1:03:18 4.5 Connecting attention and linear layers in a transformer block
1:15:13 4.6 Coding the GPT model
You can find additional bonus materials on GitHub, for example converting the GPT-2 architecture into Llama 2 and Llama 3: https://github.com/rasbt/LLMs-from-scratch/tree/main/ch05/07_gpt_to_llama
Continue this lesson in the app
Install CourseHive on Android or iOS to keep learning while you move.