AI speedruns by reading the game manual



summary
Summary

Using a game manual, an AI learned an old Atari game several thousand times faster than with older methods. This approach could be useful in other areas as well.

In March 2020, DeepMind scientists unveiled Agent57, the first deep reinforcement learning (RL)-trained model to outperform humans in all 57 Atari 2600 games.

For the Atari game Skiing, which is considered particularly difficult and requires the AI ​​agent to avoid trees on a ski slope, Agent57 needed a full 80 billion training frames – at 30 frames per second, that would take a human nearly 85 years.

AI learns to game 6,000 times faster

In a new paper, “Read and Reap the Rewards,” researchers from Carnegie Mellon University, Ariel University, and Microsoft Research show how this training time can be reduced to as little as 13 million frames – or five days.

ad

Image: Wu et al.

The Read and Reward Framework uses human-written game instructions like the game manual as a source of information for the AI ​​agent. According to the team, the approach is promising and could significantly improve the performance of RL algorithms on Atari games.

Extracting information, making inferences

The researchers cite the length of the instructions, which are often redundant, as a challenge. In addition, they say, much of the important information in the instructions is often implicit and only makes sense if it can be related to the game. An AI agent that uses instructions must therefore be able to process and reason about the information.

The framework, therefore, consists of two main components: the QA Extraction module and the Reasoning module. The QA Extraction module extracts and groups relevant information from the instructions by asking questions and extracting answers from the text. The Reasoning module then evaluates object-agent interactions based on this information and assigns help rewards for recognized events in the game.

These help rewards are then passed to an A2C-RL (Advantage Actor Critic) agent, which was able to improve its performance in four games in the Atari environment with sparse rewards. Such games often require complex behavior until the player is rewarded – so the rewards are “sparse”, and an RL agent that proceeds only by trial and error does not receive a good learning signal.

Reap the rewards outside of Atari Skiing

By using the instructions, the number of training frames required can be reduced by a factor of 1,000, the authors write. In an interview with New Scientist, first author Yue Wu even speaks of a speed-up by a factor of 6,000. Whether the manual comes from the developers themselves or from Wikipedia is irrelevant.

Recommendation

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top