Copyright Mike Fairbank 2001-2008
This is a project to try to make an artificial neural network learn something challenging and interesting. I have been using a series of “Lunar Lander” type scenarios of progressive difficulty as a motivation to develop advanced learning algorithms.
Problems and Demos
All of the demos below consist of a spacecraft that aims to fly to a landing pad. The spacecraft is controlled by an artificial neural network. The flying strategies you see in the demos have not been explicitly programmed by a human programmer, but have been learned by the computer autonomously, i.e. this is an example of artificial intelligence or reinforcement learning. For example, before any learning took place, the spacecraft would just fall straight to the ground and crash badly; but after crashing many times, eventually the neural network adapted itself to adopt some strategy capable of landing properly. As you can see, the strategies evolved in some cases are quite complex and would be fairly challenging for a human programmer to have invented or developed.
All demos contain a pre-trained neural network. No learning takes place in the demos. The sequence of flights/landscapes is pseudo-random, producing the same sequence each time. The first few in each demo were considered during the training process, so performance is noticeably slightly better on these first cases.
These lunar-lander problems are from a series of problems I have been using that show progressive difficulty to solve. These started with a very simple one-dimensional version of the lunar lander, and work up to the most complex 3rd demo. The simple one-dimensional version is demonstrated, with source code and learning algorithm, in the web-pages explaining my paper.
I believe when doing a project like this you need to have a number of intermediate milestone projects that are related and form almost a continuum of difficulty. That way when one project goes wrong you have some chance of figuring out why. I think there is not much point diving straight in and attempting your final project immediately.
The idea was to work through these milestone problems (and then hopefully further) until I found one that posed a challenge to the theory. I was also hoping that I would have, by then end of this sequence of problems, a system that could be classified as in some way as “intelligent”. I am not sure how close I have come to having achieved this. I have a full discussion of this issue here. Any comments would be welcome. I think if I had the final demo working on 100% of all landscapes then that would be quite good.
Neural Networks - Supervised Learning
Generally, the neural networks of the type used here take some information in, process it, and pass some information out. For example, in an optical character recognition system a network may take the ‘1’s and ‘0’s of a black and white pixellated image as its input, process it, and produce as its output a single number from 1 to 26 to say which letter of the alphabet it has identified. The ‘intelligence’ behind the neural net is in the fact that you can teach it just by giving it examples (e.g. of the form, “here are 100 different letter ‘A’s, here are 100 different letter ‘B’s, etc”), and that once it has been trained, it can make reasonable guesses on input it has never seen before. This is usually considered difficult for computers to do but straightforward for a human being.
This project is an example of a reinforcement learning problem. Here we give a rule on what we want the net to achieve (i.e. we want it to land gently over the landing pad in a fuel-efficient manner) without specifying how. During the training of the net, we run many test flights (which initially tend to crash badly), and for each case a ‘reward function’ is calculated based on the above criteria. At each iteration of the training process we adjust the connections in the neural net so as to try to maximise the reward function, and hopefully after a while the net will learn to fly the spacecraft intelligently.
Reinforcement learning problems are generally harder than those that work by showing examples (supervised learning). The reinforcement learning is never shown the correct answer - it has to work it out for itself. This creates a significantly non-trivial challenge, because the net takes many decisions during the course of the flight but only at the end does it find out how well it did overall. The difficulty is in working out which individual actions were responsible for doing well, or conversely badly, at the end.
I have had to develop some advanced learning algorithms to produce these demos. All learning took place off line, and no learning takes place in the demos.
Reinforcement Learning by Value Gradients is the theory I developed to create the first demo. The second and third demos were created using a recurrent neural network learning algorithm that I am hoping to write up soon.
Java Installation: These demos require Java to run, so if they don't work then either...
Here are some excellent AI theory and neural-network resources.