Summary
Full Transcript
Learn more: https://bit.ly/4lqtWmr Before a large language model can follow instructions, it undergoes two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning. In our latest short course, Post-training of LLMs, you’ll learn how to use three of the most common post-training techniques: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL), to reshape model behavior for specific tasks or capabilities. Taught by Banghua Zhu, Assistant Professor at the University of Washington, Principal Research Scientist at Nvidia, and co-founder of NexusFlow, this course covers: - When to apply post-training and how it compares to pre-training - How to curate and structure training data for each method - How to use SFT to turn a base model into an instruct model - How contrastive learning in DPO improves output quality - How to design reward functions for RL tasks like math or code - How to evaluate whether post-training improved or degraded model behavior You’ll also get hands-on experience implementing each technique with Hugging Face’s TRL library to: - Fine-tune a base model into an instruction-following assistant - Modify a model’s responses using preferred and rejected examples - Improve a model’s reasoning with online RL and verifiable rewards Whether you’re building safer assistants or targeting domain-specific improvements, this course will help you adapt LLMs with precision. Enroll now: https://bit.ly/4lqtWmr
