• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Chi-Square Degrees of Freedom Disagreement

Math Maniac

Thinker
Joined
Mar 25, 2002
Messages
161
I believe that my lab instructor is making an error about the correct number of degrees of freedom he wants us to use in our chi-square analysis.

We are using the chi-square test to evaluate our Drosophila melanogaster inheritance observations. He is having us calculate the 'observed - expected' values for each of four possibilities (male/female mutant and male/female wild type).

He insists that since the table he has set up is 2 x 2 that we use (2-1)(2-1)=1 degree of freedom and I contend that since we are calculating four 'observed - expected' values that we should use 3 degrees of freedom.

I've found a few examples online that seem to support my assertion, but a clear and convincing argument either way is what I am seeking.

Any advice is greatly appreciated.
 
I don't have time to write out a clear and convincing argument, but the reason that it is only one degree of freedom is because the calculations aren't independent. Once you know the value for one square, you can calculate the rest, first by holding one side (e.g. sex) to the null hypothesis, then the other (e.g. mutant/wildtype).

Linda
 
Unless you are omitting something from your explanation, your instructor is right.
 
It's generally the number of independent variables minus 1. A 2x2 grid has 1 df, a 2x2x2 is 2 dfs, and so on.

(PS - that's a rough and ready definition with all sorts of conditions hung on it.)
 
Last edited:
Let me try to give an example. Suppose you take N people and ask them whether they prefer Coke or Pepsi and whether they are under or over 30. There are actually several different null hypotheses you might try to reject, and they don't all have the same number of degrees of freedom.

Here's one with only one: ask if there's a correlation between preferring Coke and age. Let's rephrase that into a null hypothesis. Suppose a fraction f1 of under 30s in the population prefer Coke and a fraction f2 of over 30s prefer Coke. Then our NH is that f1=f2. The point is, there's only one way to violate the NH - by making f1-f2 not zero. Changing f1+f2 doesn't matter because it doesn't violate the NH. So there is only one degree of freedom.

Make a 2x2 table of the data. Given one element of the table we can just fill in all the rest in terms of f=(f1+f2)/2, N1 (the number of under 30s), and N2 (the number of over 30s). The number of dof is (2-1)*(2-1) = 1.

An example of a NH which has more than one degree of freedom is that, say, f1=43.6% and f2=74.9%. That has 2 dof, because you can change either f1 or f2. You could think of it as two separate 2x1 tables, each with 1 dof.

We can also get 3 degrees of freedom: given a total number of people N in our sample, our hypothesis is that there will be X people under 30 that like Coke, Y over 30 that like Coke, Z under 30 that like Pepsi, and N-X-Y-Z that are over 30 and like Pepsi. So now there are 3 independent variables. You can think of that as a 4x1 table, with 4-1=3 dof.
 
Last edited:
I believe that my lab instructor is making an error about the correct number of degrees of freedom he wants us to use in our chi-square analysis.

We are using the chi-square test to evaluate our Drosophila melanogaster inheritance observations. He is having us calculate the 'observed - expected' values for each of four possibilities (male/female mutant and male/female wild type).

He insists that since the table he has set up is 2 x 2 that we use (2-1)(2-1)=1 degree of freedom and I contend that since we are calculating four 'observed - expected' values that we should use 3 degrees of freedom.

I've found a few examples online that seem to support my assertion, but a clear and convincing argument either way is what I am seeking.

Any advice is greatly appreciated.

"degrees of freedom" is not a simple rule-based determination. Determining the degrees of freedom requires actually thinking through the issue, and understanding the concept. Then, and only then, can DF be determined.
-rules are only approximations of reality-
 
Lost power for a few hours...

I failed to consider that the possible outcomes were dependent upon the genes passed on by the parental flies. What a moronic mistake! I cannot think of why I didn't consider it.

It all comes together, finally.

It's been ten years and little practice since along with no statistics books on my shelf that led to my dilemma. Your answers and their specificity brought it all back....

Thank you for all of the wonderful replies.
 
Last edited:

Back
Top Bottom