Chi-Squared

 

What is it?

(chi-squared) is a measure of how far a set of data varies from an expected distribution. It is a "Goodness of Fit" test. The larger the number, the worse the fit is.

 

The AEB Maths syllabus requires the use of as an approximation to

You are required to do hypothesis tests on contingency tables using . For instance, we might be given the table below:

Chips

Mashed

Boiled

Male

45

22

15

Female

67

44

26

 A sample of school children were asked which was their favourite way of eating potatoes.
You are asked to find whether the choice of potato is independent of gender.

As with all hypothesis tests, we need to define and

: sex and choice of potato are independent
: sex and choice of potato are dependant

We find out what the expected values are under the null hypothesis.

To help us do this, we find the column totals and row totals :

 

Chips

Mashed

Boiled

Male

45

22

15

82

Female

67

44

26

137

112

66

41

219

  To find the expected number of males who prefer chips, we find

 

A simple way to work this out is

Work out the expected values for all the data :

Chips

Mashed

Boiled

Male

41.9

24.7

15.4

82

Female

70.1

41.3

25.6

137

112

66

41

219

 Note that you can use the row and column totals to help find these figures (e.g. female & chips must be 112 - 41.9)

 Next we find the difference between the observed (O) results and the expected results (E). We square the result to get rid of negative results, and then divide by the expected result so that the figure is expressed as a proportion of what was expected.

 Hence [in formula book]

 For the data above, we get:

O

E

| O-E |

45

22

15

67

44

26

41.9

24.7

15.4

70.1

41.3

25.6

3.1

2.7

.4

3.1

2.7

.4

.23

.30

.01

.14

.18

.01

Thus our test statistic,comes to 0.87
The shape of a typical distribution is shown below:

 The value of Chi-squared is always positive, and the distribution is positively skewed.

 

 

Degrees of Freedom

The distribution has one parameter, (pronounced "new"). This is a measure of the "degrees of freedom". In other words, the number of free choices that you can make when allocating values to the expected frequencies. In this case there are 2 because you would only need 2 figures, for example (male/chips) and (male/mashed) and you would be able to work out the other figures from the row totals.

 In general, degrees of freedom = (number of rows- 1) ´ (number of columns- 1)

The table of the chi-squared distribution is in the formula book, and shows critical values of for varying degrees of freedom.

 At the 5% significance level, the critical value is 5.991, so we accept the Null Hypothesis and conclude that choice of potato is independent of sex.

 

 

Yates' Continuity Correction

When there is only one degree of freedom, i.e. on a 2 by 2 contingency table, the formula for the approximation to chi-squared is changed slightly to i.e. subtract a half from each absolute value of O- E before squaring. The question will usually remind you to use Yates' correction when it is necessary, but the altered formula is not in the formula book.

 

 

When E is less than 5

 If the expected value for any individual cell is less than 5, then the row or column must be combined with another, as the approximation will not be good enough on such a small value.

In this case, the choice of which row or column to combine it with should be based on these criteria, which should be taken in order:

In the above example, if the expected number of male/boiled was less than 5, the boiled column would be combined with the mashed column, to make the expected frequencies of the remaining columns closer.