DeepSeek-R1 review paper

Deepseek-R1

cold-start data
multi-stage training pipeline.

we present:

(1)DeepSeek-R1-Zero, which applies RL directly to the base model without any SFT data, and (2) DeepSeek-R1, which applies RL starting from a checkpoint fine-tuned with thousands of long Chain-of-Thought (CoT) examples.

Distill the reasoning capability from DeepSeek-R1 to small dense models.