NEUROPILOT1

Copyright Mike Fairbank 2001

 

This is a demo of a neural-network controlled spacecraft.  To run the demo, click the ‘Next’ button (or ensure the “Auto” box is selected) to release spacecraft from a random position, velocity and orientation.  The neural-network then tries to land the craft on the landing pad efficiently and gently. 

 

 

 

Reinforcement Learning with Value Function

The net used is a feed-forward multi-layer network, trained to approximate a value function.  It takes as input the spacecraft’s state variables (position, velocity and orientation) and produces as output an estimate of the favorability of that situation.  The algorithm to fly the spacecraft is a ‘greedy policy’, i.e. it does a 1-step look-ahead that considers each possible successive state and chooses the state that the neural network estimates as most favourable.  The states considered in the 1 step-look ahead are those states arrived at by the internal physics model when considering all combinations of thrusting (yes or no?) and rotating (rotate-left, rotate-right or rotate not at all?).

Reward Function

The net was trained off-line using a gradient-descent reinforcement learning algorithm.  The reinforcement function used was something like:

k1*fuel_used + k2*total_rotations_performed + k3*landing_velocity^2 + k4*dist_from_landing_pad^2

Learning

The learning algorithm behind this demo is called VGL(1), as described in my paper.

Each iteration of the training algorithm consisted of a complete simulation of all of the 10 training flights followed by an update of the network connections.  Each iteration took about 20s to run in java, and the net took about 200 iterations to train.  The result in the demo was the first trained net produced.  But I could not find any significant improvements by re-running or changing the number of training flights.

Training also used RPROP.  I also used complex values in the net inputs and weights because this should make the rotational symmetries of the problem more apparent.  (I think this is a rare example of the power of complex nets).

The way I finally got more reliable learning performance and learning more than just 10 training flights was to use the learning algorithm VGL(Omega), as demonstrated and described here.

 Results

The trained net shown working in the demo lands the spacecraft pretty well in most cases.

 The random number generator used to specify the starting conditions for each flight is seeded by the flight-number.  That’s why the demo always shows the same sequence of flights.  The first 10 of these flights were used to train the net.  All later flights were not seen at all by the net during training.  There is a slight noticeable degradation of performance on these unseen cases.  But I don’t think it’s too bad considering that only a small number of flights were used in training.

 I am pleased to say that the solutions found by the neural net produced a couple of surprises for me:

1.  It is sometimes beneficial to thrust downwards.  I thought this would never be true.  But if the trajectory starts off hurtling upwards, then it is indeed a good strategy.

2.  The networks learn that it is much more fuel-efficient to save the braking-thrusts for later in the journey than to brake early and descend slowly from a great height.

 Contact Me

If you have any feedback / questions on this then please e-mail me (see home page).

 Back to project home page