Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision
Yinghui He, Simran Kaur, Adithya Bhaskar, Yongjin Yang, Jiarui Liu, Narutatsu Ri, Liam Fowl, Abhishek Panigrahi, Danqi Chen, Sanjeev Arora
Download Paper
Arxiv Preprint, 2026
SD-Zero improves reasoning by having one model generate answers and then revise its own mistakes to turn sparse outcome feedback into dense token-level training signals, boosting 10%+ performance without an external teacher.
