A Recurrent Neural Network Demo
Copyright Mike Fairbank 2005
In this demo, a pre-trained recurrent neural network controls a spacecraft. The objective is to use a range-finding scanner to find and fly to a landing pad.
During each time-step of the animation the neural-network decides in which direction to point the scanner and then receives the range as feedback. The neural network must use this information and combine it with its memory of past scan results to try to find the bottom of the V shaped landscape.
To run the demo, click the ‘Next’ button to start a flight in a new landscape, or ensure the “Auto” box is ticked.
Further Information
Each landscape is “V” shaped, with random location of the landing pad, and random gradients of the two sides. The landing pad is always the lowest point.
The neural network used in this demo has already been trained. As in the other demo, the sequence of landscapes is fixed (the pseudo-random numbers are seeded). It was trained on the first 500 flights (landscapes). Although not perfect, it has nearly mastered this task. And there is no noticeable degradation of performance on other landscapes that were not seen during training. I am amazed and delighted this problem has been solved at all.
It was trained on a reward function that rewards landing velocity, distance from landing pad, and fuel consumed.
Before the net was trained, it had no idea of the meaning of its inputs or outputs. For example it had no idea that the scanner input and outputs were in anyway special. It has just managed to notice for itself that there are correlations between scan results and reward it eventually obtains on landing! This is especially hard as there is a long time between the final reward and the early decisions.
Neural Network Structure (Recurrent Neural Network).
Because this task involves remembering previous scan directions and scan results, it uses a recurrent neural network. This type of neural network has some extra output units which are fed back in as inputs in the next time step. These are the recurrent “feedback units”. Using these, the neural network can remember information from the past to make decisions about the future.
The neural network has the following outputs:
Thrust (0 to 1).
Rotation (-1 to +1).
Direction to scan in
Several feedback output units to be fed back in as inputs into the next time-step.
The neural network has the following inputs:
Velocity (x and y components).
Orientation.
Fuel remaining.
The range found by the last scan.
The feedback input units that receive input directly from the feedback output units of the previous time-step.
Note that the position is not an input to the neural net. This means it has to rely entirely on the scans to calculate its position.
There are no hidden layers in the neural net. All weights, inputs, outputs and activation functions are complex valued. I chose to use complex values to make any rotational symmetries more easily apparent.
Recurrent Nets
Recurrent neural networks are difficult to train, in my opinion, because data passes through them many times, and therefore any small change to the weights could have large repercussions to the final output. This problem makes smooth learning difficult. Because there are many (e.g. approximately 30) time-steps in this animation, data is passing through the network 30 times and it is therefore as difficult as training a 30*n-layer neural network.