When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient — Shuning Shang, Hubert Strauss, Stanley Wei, Sanjeev Arora, Noam Razin | Kutubxona