My Inverse Kinematics Article has been one of my most popular, and I still get people asking me more about it. The sad thing is that I always have to explain to them that the article is way out of date. It was one of my first articles, written when I was still working out how to do it. Since then, I've come up with faster, and more robust methods.
Lets look at a simple, two jointed, robot arm. It is fixed at the base, which is also the origin.
The angle of the first joint, we shall call a, and the angle of the second, we shall call b. The length of each bone is 10. We can also imagine a target (red star). The vector from the tip to the target is known as the error. 
As we move the robot arm around, changing the angles a and b, the tip gets nearer and further from the target
We can plot a graph of this behaviour. On the left is a square graph. The origin is in the centre. The xaxis represents angle a, while the yaxis represents angle b. So at the origin, the arm is straight. We shall call the space of all possible orientations of the robot arm the Configuration Space. Each point on the square contains a shade of grey, which represents the distance between the tip and the target. Lighter shades are greater distances, while black represents the tip and target touching. There are, in fact, two places where the distance equals zero, representing the two ways the arm can touch the target. I have marked these with red spots. It should be obvious, therefore, that to get the tip to touch the target, you just need to find the blackest parts of the graph. These low points are known as Minima. (which is plural for the Latin word Minimum). Any algorithm you can think of, for finding the lowest points on graphs can be used for Inverse Kinematics. This is easy, of course, for a human, looking at a graph which has been drawn for us. It's a lot harder, when you consider that your algorithm will not have this graph drawn out for it in advance. Typically, a robot arm would begin at some point in the configuration space, and be instructed to reach the target. All it knows at that moment is how far it is from the target (i.e. the value of the graph at that point). Which way should it move now? 
The easiest way to find the lowest points on the graph is simply to follow the hills downwards.
Look at this example. The target is the red crosshairs. In this position, rotating joint A a tiny bit will move the tip in the direction of the arrow a, and moving joint B a tiny bit will move the tip in the direction of the arrow b. To get the tip to the target, we want to move the tip in the direction of the t arrow. Clearly, rotating joint B will be of no use, as it will move the tip perpendicularally to the right direction. However, if we move joint A, it will move roughly in the right direction.

Now let's look at this in terms of a graph.
Starting at a random configuration of the arm (start), we can look at the vectors a and b, and rotate each joint slightly, in proportion to how much it helps get the tip to the target. You can think of this as examining the local gradient of the graph. Work out which direction leads us fastest downhill and move a little in that direction. Do the same over and over forever, or until you get close enough to the target to be happy.
On the right, I have shown the graph of error, and added in contour lines so that you can easily see the gradient. You can see that the path the arm takes through the graph crosses the contours at right angles, which shows that it is moving directly down hill.
Point P is interesting. It represents the position of the arm drawn above. You can see that as this point the arm is moving only in the A axis direction. 
Advantages This method has the advantage that it takes you smoothly from the start to the finish, which is great for computer graphics, or robotics, where you need smooth motion. It also has other advantages which we will see later. One real advantage of gradient following is that you can follow functions that are defined only by their gradient! What does this mean? I'll come to that later. 
Disadvantages 
function Calc_Distance(angle_A, angle_B) work out the tip position for joint A = angle_A and joint B = angle_B return distance from calculated tip position to target end function dist = Calc_Distance(a, b) while (dist > 0.1) { gradient_a = Calc_Distance(a+1, b)  Calc_Distance(a1, b) gradient_b = Calc_Distance(a, b+1)  Calc_Distance(a, b1) a = gradient_a b = gradient_b dist = Calc_Distance(a, b) } 
Gradient by Calculation

You can see that movement_vector is as long as ToTip, and at right angles to it. The further the tip is away from the joint, the faster it will move when the joint rotates. The vector axis is a unit vector.

In fact, a target need not even be defined in space. Since we can use gradient following, a target may be a direction (like downwards), or a direction field (a vortex for example).
Plane Target If you want the tip of your structure to touch a table top, or a wall, for example, you can use a Plane Target. To make a plane shaped target, we need to be able to calculate the vector from the tip, to the nearest point on the plane. We shall define our plane by it's surface normal (green arrow), and a point on the plane (green sphere). Clearly the vector we want will be parallel to the surface normal. But how long is it? p = point_on_plane  tip t = n * dotproduct(p, n) 

Cylinder Target Again, useful for touching cylinder shaped objects, or just making the tip stay near an object. v1 = axis * dotproduct(tipi, axis) v2 = i+v1 DistanceToTarget = magnitude(v2)  radius ToTarget = v2 * (DistanceToTarget / magnitude(v2)) 

Ring Target A ring target is very useful for guiding roboic fingertips to pick up an object. You can place the ring around a cup, for instance, and slowly decrease the radius of the ring, until the fingertips grip the cup. 
One way we can create this behaviour is by defining a vector field. Each point in space has a vector associated with it, and wherever the tip is in that space, it considers the target to be in the direction of that vector.
In order to simplify your code, with all the different types of target around, you can write a function which, given a TARGET, and a Tip_Position, will return a vector from the tip to the nearest point on the target. We begin by defining a structure TARGET containing several objects used by all types of target. Firstly, type, will tell the function which type of target this is. Then, centre, axis and size are three generic values that can be used to define many types of target.

 It looks unconfortable or unnatural
 It makes the character fall over
 There are obstacles in the way
The good thing about Method 1, is that you can add lots of soft constraints into the mix, and it will try to solve them all at once, and you don't need to know how to differentiate them.
All we need to do, to make things look comfortable, is to impose a small 'penalty' for uncomfortable or unnatural positions. It depends how you define these things, but for something like a human arm, natural suggests lazy. I don't want to lift my arm any higher than I have to. Therefore, we can add to our error function, a cost for the centre of gravity of the arm.
function Calc_AveHeight(angle_A, angle_B) AveHeight = 0 TotalMass = 0 for each bone loop AveHeight = AveHeight + (Height_Of_Centre_Of_Gravity_Of_This_Bone * Mass_Of_This_Bone) TotalMass = TotalMass + Mass_Of_This_Bone end loop AveHeight = AveHeight / TotalMass return AveHeight end function function Calc_Distance(angle_A, angle_B) work out the tip_position for joint A = angle_A and joint B = angle_B return distance from calculated tip_position to target end function function Calc_Error(angle_A, angle_B) Error = Calc_Distance(angle_A, angle_B) * W1 + Calc_AveHeight(angle_A, angle_B) * W2 end function dist = Calc_Distance(a, b) while (dist > 0.1) { gradient_a = Calc_Error(a+1, b)  Calc_Error(a1, b) gradient_b = Calc_Error(a, b+1)  Calc_Error(a, b1) a = gradient_a b = gradient_b error = Calc_Error(a, b) } 
As you might have guessed, Gradient following is not an amazing means of function minimisation. It can take a long time to reach it's target, possibly requiring thousands of calculations, and even then, only coming within a finite distance of the target. There are many other algorithms for function minimisation. Here are a few:
Minimization Algotithms: http://esd.lbl.gov/ITOUGH2/Minimization/minalg.html
A very brief description of several algorithms for function minimisation.
Downhill Simplex Method for Many (~20) Dimensions: http://paula.univ.gda.pl/~dokgrk/simplex.html
"The downhill simplex method is due to Nelder & Mead (1965). The method requires only function evaluations, not derivatives. It is not very efficient in terms of the number
of function evaluations that it requires. However the downhill simplex method may frequently be the best method to use."
Rosenbrock Method: http://iridia.ulb.ac.be/~fvandenb/work/rosenbrock/
The Rosenbrock method is a 0th order search algorithm (it means it does not require any derivatives of the target function.
Only simple evaluations of the objective function are used). Yet, it approximates a gradient search thus combining advantages of 0th order and 1st order strategies.

