Rewards: keep out of reach of paradoxes
June 19, 2008
The reward hypothesis states
“That all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward).”
For RL researchers, this is really good news: it means that whenever they come up with a reward-maximizing algorithm, it can attack (in principle) any goal-oriented task. Although a bit vague, one can heartily agree with the hypothesis. One must be careful, though, to keep it out of reach of paradoxes.
Although in its original form Newcomb’s paradox involves omniscient deities, philosophers arguing over free will and other dubious figures, it seems like it can be liberated of these, and leave something that is still paradoxish, but also RL-ish – a dangerous mix.