ESI UCLM
HomeOpinionReinforcement learning: lesser known area of ​​machine learning

Reinforcement learning: lesser known area of ​​machine learning

Enrique Villarrubia student of Master in Computer Engineering and Doctorate

Reinforcement learning: lesser known area of ​​machine learning

By Enrique Villarrubia (Master's and PhD student).

Typically, machine learning or machine learning is known for supervised and unsupervised learning. Both need to have observations or data to work with in order to explore possible underlying patterns. The first of them, from labeled data, learns to predict the output (classification or regression), and the second, learns the inherent structure of the data and helps us understand it better. But what about reinforcement learning?

Reinforcement learning is based on how an agent learns by interacting in an environment without being told what actions to perform, but instead discovering which actions lead to the maximum reward by trying them. The best similarity to a real example is how children learn through trial and error. Let's see it with an example and how it relates to the basics of reinforcement learning!

Suppose we are playing the Super Mario Bros. video game. The environment is the video game itself, the image we are seeing on the monitor is the current state, the possible actions correspond to the 4-way movement and jump buttons, and, Finally, the rewards will be positive when we defeat a Woompa or complete the level and negative when we are eliminated or as time goes by, since we want to encourage the agent to move and learn by exploring the environment. The following image presents a summary of these basic elements in reinforcement learning.

Basic elements of reinforcement learning in the video game Super Mario
Figure 1. Basic elements of reinforcement learning in the video game Super Mario Bros [1]

In recent years, thanks to deep reinforcement learning (the use of neural networks to approximate any component of reinforcement learning) and Monte Carlo search trees, it has been possible to beat the world champion of the board game Go, which is more computationally complex than chess, with the AlphaGo algorithm [2]. Such was the impact that even Netflix produced a documentary about it with the English company DeepMind that developed the algorithm [3]. Subsequently, the algorithm evolved so as not to require the use of expert knowledge of players through the generation of games of the agent against itself (AlphaGo Zero) [4], adapted to more games such as chess and shogi (AlphaZero) [ 5] and, finally, not needing to know their rules (MuZero) [6]. Furthermore, we can also find these superhuman performances in more complex video games with imperfect information such as in StarCraft II (AlphaStar) [7]. This algorithm uses supervised learning in its first iterations, but it is thanks to reinforcement learning that it manages to make this leap in quality to achieve the Grandmaster skill level (the highest in the game) and beat the world champions.

Representation of AlphaStar's game against MaNa, one of the best players in the world
Figure 2. Representation of AlphaStar's game against MaNa, one of the best players in the world [8]

And now, quite possibly you are wondering, and is reinforcement learning only useful for games? Nope! Games are used for tasks of benchmarking and check how good these algorithms are, but currently we can find real applications such as controlling the burning plasma inside a Tokamak nuclear fusion reactor [9], achieving much better control than the one with previous systems, or multitude applications in robotics and other fields of knowledge.

Finally, today, the latest advances in reinforcement learning are focused on solving seq2seq (sequence by sequence) problems with the use of attention mechanisms and the parallelizable training offered by Transformers (a neural network model). In the following image you can see Gato [10], a generalist artificial intelligence designed with these pretexts capable of completing sentences, playing Atari games, stacking boxes with a mechanical arm, being a chatbot, etc., all with the same model and without the need to retrain it for each of the tasks.

Cat, a generalist deep reinforcement learning sequence model
Figure 3. Cat, a generalist deep reinforcement learning sequence model [10]

In conclusion, although reinforcement learning is not as famous as its other two machine learning brothers, we have been able to verify its great milestones and the usefulness it presents, especially in certain environments. Finally, thanks for reading this article and I hope you found the topic interesting, which I love.


References.

[1] «An Introduction to Reinforcement Learning». FreeCodeCamp.Org, March 31, 2018, https://www.freecodecamp.org/news/an-introduction-to-reinforcement-learning-4339519de419/

[2] Silver, David, et al. «Mastering the Game of Go with Deep Neural Networks and Tree Search». Nature, vol. 529, no.o 7587, January 2016, p. 484-89.https://doi.org/10.1038/nature16961

[3] "AlphaGo Movie". Alpha Go Movie, https://www.alphagomovie.com/

[4] Silver, David, et al. «Mastering the Game of Go without Human Knowledge». Nature, vol. 550, no.o 7676, October 2017, p. 354-59. https://doi.org/10.1038/nature24270

[5] Silver, David, et al. "A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go through Self-Play". Science, vol. 362, no.o 6419, December 2018, p. 1140-44.https://doi.org/10.1126/science.aar6404

[6] Schrittwieser, Julian, et al. "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model". Nature, vol. 588, no.o 7839, December 2020, p. 604-09. https://doi.org/10.1038/s41586-020-03051-4

[7] Vinyals, Oriol, et al. "Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning". Nature, vol. 575, no.o 7782, November 2019, p. 350-54. https://doi.org/10.1038/s41586-019-1724-z

[8] AlphaStar: Mastering the Real-Time Strategy Game StarCraft II. https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii

[9] Degrave, Jonas, et al. "Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning". Nature, vol. 602, no.o 7897, February 2022, p. 414-19. https://doi.org/10.1038/s41586-021-04301-9

[10] Reed, Scott, et al. «A Generalist Agent». arXiv: 2205.06175 [cs], May 2022. arXiv.org, http://arxiv.org/abs/2205.06175

Share with:
Rate this item