Reinforcement Learning FAQ
RL FAQ
Frequently Asked Questions about Reinforcement Learning
Edited by Rich Sutton
Initiated 8/13/01
Last updated 7/13/03
What is Reinforcement Learning?
Reinforcement learning (RL) is learning from interaction with an environment, from the consequences of action, rather than from explicit teaching. RL become popular in the 1990s within machine learning and artificial intelligence, but also within operations research and with offshoots in psychology and neuroscience.
Most RL research is conducted within the mathematical framework of Markov decision processes (MDPs). MDPs involve a decision-making agent interacting with its environment so as to maximize the cumulative reward it receives over time. The agent perceives aspects of the environment´s state and selects actions. The agent may estimate a value function and use it to construct better and better decision-making policies over time.
RL algorithms are methods for solving this kind of problem, that is, problems involving sequences of decisions in which each decision affects what opportunities are available later, in which the effects need not be deterministic, and in which there are long-term goals. RL methods are intended to address the kind of learning and decision making problems that people and animals face in their normal, everyday lives.
For more information, see the sources recommended for an introduction to RL.
How does RL relate to behaviorism?
Formally, RL is unrelated to behaviorism, or at least to the aspects of behaviorism that are widely viewed as undesireable. Behaviorism has been disparaged for focusing exclusively on behavior, refusing to consider what was going on inside the head of the subject. RL of course is all about the algorithms and processes going on inside the agent. For example, we often consider the construction of internal models of the environment within the agent, which is far outside the scope of behaviorism.
Nevertheless, RL shares with behaviorism its origins in animal learning theory, and in its focus on the interface with the environment. RL´s states and actions are essentially animal learning theory´s stimuli and responses. Part of RL´s point is that these are the essential common final path for all that goes on in the agent´s head. In the end it all comes down to the actions taken and the states perceived.