Scott Howard
02/07/2025, 3:14 PMArXiv Dives
. This week we’ll cover How DeepSeek-R1 used GRPO for Reinforcement Learning
, building upon last week’s DeepSeek paper review (if you want to see that video on How R1 and GRPO Work - deep technical dive into DeepSeek’s Models, check that on ). We’re live today at 10amPT… grab your coffee and come join the convo!
https://lu.ma/arxivdive-36