DeepLearning.AI Courses - New short course: Reinforcement Fine-Tuning with GRPO

5.0 (2)

18 learners

What you'll learn

This course includes

5.5 hours of video
Certificate of completion
Access on mobile and TV

Summary

Full Transcript

Learn more: https://bit.ly/43p1WIa DeepSeek has put reinforcement learning at the top of the minds of developers, machine learning engineers, and data-driven professionals in the AI space. That’s why we’re happy to launch a new short course: Reinforcement Fine-Tuning LLMs with GRPO, built in collaboration with @Predibase and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Machine Learning Engineer. Many LLM applications rely on reasoning, whether in solving math problems, generating code, or completing multi-step tasks. But fine-tuning models for reasoning is often constrained by the availability of high-quality labeled examples. This course introduces a different approach: Reinforcement Fine-Tuning (RFT) using Group Relative Policy Optimization (GRPO). GRPO is a scalable reinforcement learning algorithm that lets you train models using reward functions instead of human-labeled data or preference scores. You’ll learn: - When reinforcement fine-tuning is a better fit than supervised fine-tuning - How to build and use programmable reward functions in GRPO - How to guide model behavior on structured tasks like the Wordle game - How to evaluate subjective outputs, like summaries, using LLMs as judges - How to avoid reward hacking by combining reward and penalty signals - How to implement GRPO loss: token ratios, clipping, advantages, and KL divergence - How to run RFT jobs using Predibase’s training platform By the end of the course, you’ll know how to fine-tune LLMs for complex reasoning tasks without needing large datasets or manual preference data. Enroll now: https://bit.ly/43p1WIa

Continue this lesson in the app

Install CourseHive on Android or iOS to keep learning while you move.

Related Courses

Welcome

DeepLearning.AI Courses - New short course: Reinforcement Fine-Tuning with GRPO

What you'll learn

This course includes

Summary

Full Transcript

Continue this lesson in the app

Related Courses

In-Depth Graphic Design Courses — Satori Graphics

Free Game Design Courses

Confidence Courses

🎓 Free Professional Courses with Certificates | Skills for Career Growth

FAQs