Results of Temporal Difference Reinforcement Agents in Robocode Project
DESCRIPTION OF PROJECT

RoboCode (robocode.alphaworks.ibm.com) is a battle tank simulator. Its original goal was to provide a platform for beginning programmers to learn Java. As it developed, though, many advanced programmers have written tanks to fight in this arena. Some tanks use very sophisticated algorithms for determining the movement of opponents, to fire bullets at a moving target, and to avoid attacks by others.

The better advanced tanks use a primitive form of cased based reasoning to create offensive and defensive strategies. Mad Hatter,the base robot used as a target for learning, uses sophisticated opponent tracking and predicting, including different hard coded strategies for game states.

It was decided to use Mad Hatter as the target for learning because its mechanisms are advanced and it wins all competitions against beginner robots, most competitions against intermediate, and a few against advanced robots, which provides a way to rank other robots trained against. The two robots created to compete against Mad Hatter were Wingnut and WingPolicy.

Wingnut uses SARSA(lambda) learning with eligibility traces. WingPolicy uses SARSA but maintains a policy table which maps states to actions. Unlike normal SARSA the Q value table is only updated when a reward other than 0 is obtained. Between these updates actions are selected from the policy table based on the state. Both robots use a greedy approach to action selection when more than one action can occur in a state.

Presentation

DOWNLOAD

Download the fully trained versions of Wingnut SARSA(lambda) bot, WingPolicy SARSA Policy bot and a User Manual (for Solaris) to get started.

See the Official RoboCode Site to download RoboCode and for further information.

STATISTICS
Raw statistics can be found here. Processed statistics are availabe in the presentation.