Course Hive
Search

Welcome

Sign in or create your account

Continue with Google
or
New short course: Reinforcement Fine-Tuning with GRPO
Play lesson

DeepLearning.AI Courses - New short course: Reinforcement Fine-Tuning with GRPO

5.0 (2)
18 learners

What you'll learn

This course includes

  • 5.5 hours of video
  • Certificate of completion
  • Access on mobile and TV

Summary

Full Transcript

Learn more: https://bit.ly/43p1WIa DeepSeek has put reinforcement learning at the top of the minds of developers, machine learning engineers, and data-driven professionals in the AI space. That’s why we’re happy to launch a new short course: Reinforcement Fine-Tuning LLMs with GRPO, built in collaboration with @Predibase and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Machine Learning Engineer. Many LLM applications rely on reasoning, whether in solving math problems, generating code, or completing multi-step tasks. But fine-tuning models for reasoning is often constrained by the availability of high-quality labeled examples. This course introduces a different approach: Reinforcement Fine-Tuning (RFT) using Group Relative Policy Optimization (GRPO). GRPO is a scalable reinforcement learning algorithm that lets you train models using reward functions instead of human-labeled data or preference scores. You’ll learn: - When reinforcement fine-tuning is a better fit than supervised fine-tuning - How to build and use programmable reward functions in GRPO - How to guide model behavior on structured tasks like the Wordle game - How to evaluate subjective outputs, like summaries, using LLMs as judges - How to avoid reward hacking by combining reward and penalty signals - How to implement GRPO loss: token ratios, clipping, advantages, and KL divergence - How to run RFT jobs using Predibase’s training platform By the end of the course, you’ll know how to fine-tune LLMs for complex reasoning tasks without needing large datasets or manual preference data. Enroll now: https://bit.ly/43p1WIa

Course Hive

Continue this lesson in the app

Install CourseHive on Android or iOS to keep learning while you move.

Related Courses

FAQs

Course Hive
Download CourseHive
Keep learning anywhere