“Reinforcement Learning is the study of agents and how they learn by trial and error. It formalizes the idea that rewarding or punishing an agent for its behavior makes it more likely to repeat or forgo that behavior in the future.” (Reference: OpenAI)
Diagram of RL interactions (Reference: OpenAI):
For basic understanding of the concept of Reinforcement Learning (RL), it is not a bad idea to relate the learning of infants with the process how an RL agent learns. An infant tries to do an activity, if it is supported by parents, she comes to know this particular activity is good to do. Infant’s learning is somewhat intuitive. Infant learns with a specific goal. And the achieving the goal by an infant is not immediate, it is delayed to some extent. Similarly learning of an RL agent is also for sequential decision making, supported by the nature of generation of rewards in its own environment. This learning process creates a feedback loop for continuous improvement.
Reinforcement Learning leverages on the concept of Markov Decision Process (MDP). Basic principle of MDP depends on the fact that the current state of learning is the result of all accumulated previous learnings. Therefore, the future learning depends on the current state of learning, not on the previous sequences of learnings.
Relationship of Reinforcement Learning with other fields of Artificial Intelligence
Deep Learning uses Neural Network based algorithmic representations to solve problems having multi-dimensional and non-linear characteristics. With the use of training data, Deep Learning discovers the patterns, but it does not pinpoint the decision. Reinforcement Learning takes the decision.
Machine Learning algorithms are broadly categorized as Supervised Learning and Unsupervised Learning. Supervised Learning uses labeled training data to generalize the patterns to solve the problem in real life situations with newer data in future. Unsupervised Learning, on the other side, uses training data without any label, and categorizes the training data based on the patterns inherent in the training data. These categories lead to solutions of real-life problems with newer data in future. On the contrary, in Reinforcement Learning, data comes from the environment continuously and the RL algorithms keep on learning sequentially leading to the achievement of the goals.
Broad Overview of Reinforcement Learning Algorithms
Reinforcement Learning Algorithms are broadly classified as Model-Based algorithms and Model-Free algorithms. When the behavior of the agent’s environment is known beforehand, the algorithms that are applied are mostly Model based. Otherwise, for unknown environment, the best applicable models are primarily Model-free.
Diagram of RL Algorithms Hierarchy (Reference: OpenAI):
For details, please refer to: https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html
How to determine whether a particular problem needs application of Reinforcement Learning
If the problem demands Trial and Error learning with delayed rewards through a feedback control mechanism, the problem might be solved with Reinforcement Learning. Another principle that the problem should follow is whether the problem supports the principles of Markov Decision Process.
Before going to production, the application of Reinforcement Learning should be tested in a simulated environment.
Identification of appropriate algorithm will determine whether an RL algorithm can solve the problem or else RL algorithm need to be clubbed with Deep Learning algorithm.
Applications of Reinforcement Learning
- Chemistry, Biology and Life Sciences – The main value of RL is to better perform any work that depends on trial-and-error learning process. So any chemical reaction that needs a trial-and -error process, it can be optimized using RL.
- Drug Design and Healthcare – A very important area with huge potential for application of RL. For example, conceptually, in an use case of new Drug Design, a Generative Deep Neural Network Model can try to generate the target Drug molecule whereas a Predictive Deep Neural Network Model can act as the critic with defining the reward. These DNN algorithms can operate under RL algorithms for making the right choice of the drug molecule.
- Autonomous Vehicle – Autonomous Vehicle is an important application area that is utilizing RL. Traffic safety, fuel efficiency, reduction in collisions and injuries, better traffic flow etc are the promises of Autonomous Vehicle. In the use case of Autonomous Vehicle, the driver is the agent. Environment of the agent is defined by IoT devices using Computer Vision technologies, radar etc. Reward function can be modeled using the driver characteristics like inclination to overtake, attitude to tailgate or lane change etc. The Driver-Environment interaction can be defined as Stochastic Markov Decision Process under the purview of Reinforcement Learning.
- Trading and Finance – RL algorithms driving the sequential learning have potential to be applied in the areas like Portfolio Management/Optimization, Trading etc.
- Smart City Applications like Traffic Control System – RL algorithms have the potential to minimize traffic delay and congestion in intersections of cities, considering the dynamic nature of city traffic.
- Gaming – DeepMind, a Google company, made Reinforcement Learning a hot area of interests with Gaming applications like AlphaGo and AlphaGo Zero.
- Web System Configuration – This is a niche area where RL algorithms can be applied to manage huge number of configurable parameters that are manually set up through trial and error in Industry.
- Artificial Intelligence: Robotics – A Robot, like an infant, learns through trial and error. Robotics is a huge application area of RL.
- Artificial Intelligence: NLP – Researches are being done to apply RL algorithms in NLP areas like text summarization, question answering, and machine translation for chatbot communications.
- Marketing with Personalized Recommendation – An important area that can leverage RL to better model future recommendations for users. It properly takes care of highly dynamic nature of items of interests as well as of user preferences.
- Bidding – This is an interesting application area of RL. In Bidding, a pricing quote by a bidder affects the probable pricing quote for other bidders, making the bidding environment dynamic with multi-agent activities. Algorithms like Distributed Coordinated Multi-Agent Bidding (DCMAB) are being proposed in this kind of multi-agent situation.
- Production – FaceBook launched Horizon as an open source end-to-end platform using RL for large-scale production environment.
Shortcomings of Reinforcement Learning
- Learning the Policy governing the environment is time consuming.
- Designing a Reward function in Reinforcement Learning is very difficult.
- Learning with RL may create a Local Optima, even if a Good Reward function exists.
Conclusion – Future with Reinforcement Learning
With the continuing support from big companies like Google (DeepMind), the application of RL in human life and society is on the rise. The promise of Deep Reinforcement Learning is very strong with applications in different Industry Sectors as mentioned above. Reinforcement Learning is thought to achieve Artificial General Intelligence in days to come.