Stabilizing Efficient Reasoning with Step-Level Advantage Selection — Han Wang, Xiaodong Yu, Jialian Wu, Jiang Liu, Ximeng Sun, Mohit Bansal, Zicheng Liu | Kutubxona