2026-03-03 00:00:00:0王 毅3014312510http://paper.people.com.cn/rmrb/pc/content/202603/03/content_30143125.htmlhttp://paper.people.com.cn/rmrb/pad/content/202603/03/content_30143125.html11921 做人民的勤务员(金台潮声·政绩观系列谈)
蒸馏是模仿,学强模型的输出,把它的「答案形状」复制过来;RL 是探索,模型必须大量自己推理、自己生成、在错误里反复迭代,从试错中提炼能力。
,推荐阅读夫子获取更多信息
A closer look at Honor’s Robot Phone
Sarah Snook, All Her Fault
RFU due to confirm shake-up of rugby’s top division