• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Check your p's and alpha's

Ivor the Engineer

Penultimate Amazing
Joined
Feb 18, 2006
Messages
10,590
Ok, stats experts and users of statistical tests, how many times have you seen p<alpha or p<.05, p<.01 and p<.001 enumerated into "significant", "highly significant" and "extremely significant"?

According to this paper, they're incorrect and are attempts to fuse two different and incompatible ideas about statistical testing.

Or have I got it all wrong?
 
Seems to me that the marketing people have confused things. It's not something I have come across in my area of Life Sciences.
 
I take it you've never ventured into social sciences where I've seen p get up to <.20 and still be significant.
 
I take it you've never ventured into social sciences where I've seen p get up to <.20 and still be significant.

One of my psychology books has lots of bar charts showing the control and experimental group responses to a number of tests. The differences looked really large until I noticed the vertical axis was offset from zero to emphasize the (often only a few percent) difference between the two groups.

Given the small number of participants in the experiments it made me wonder about if the results were that significant.
 
I suspect a good part of it is laziness/complacency in the use of the wording rather than confusion. Maybe. My other suspicion is that a lot of research is a special case where it doesn't matter (they are essentially the same) so it's not important to make the distinction. Unfortunately, that doesn't apply to most paranormal/alternative medicine research, so when you try to make the point it is not understood, or you get accused of holding their research to unreasonable standards (i.e. different from that of conventional research).

Also, one does not always see the distinction made between "clinically significant" (i.e. a difference that is large enough to care about (who really cares if the duration of a cold is shortened by 1 hour)) and "statistically significant". The PEAR data where they reported that individuals could influence the result of one toss out of 1,000 as "highly significant" comes to mind.

Linda
 
I suspect a good part of it is laziness/complacency in the use of the wording rather than confusion. Maybe. My other suspicion is that a lot of research is a special case where it doesn't matter (they are essentially the same) so it's not important to make the distinction. Unfortunately, that doesn't apply to most paranormal/alternative medicine research, so when you try to make the point it is not understood, or you get accused of holding their research to unreasonable standards (i.e. different from that of conventional research).

Also, one does not always see the distinction made between "clinically significant" (i.e. a difference that is large enough to care about (who really cares if the duration of a cold is shortened by 1 hour)) and "statistically significant". The PEAR data where they reported that individuals could influence the result of one toss out of 1,000 as "highly significant" comes to mind.

Linda

What got me thinking about this was reading the abstracts of the studies into homeopathic treatment of flu symptoms you posted on another thread. There were p values of 0.023 and 0.03 quoted, both of which fall in the 5% level of significance.

You mentioned something about adjusting the level of significance down to account for the repeated measurements. From my limited reading on making such adjustments, because each reading would be highly correlated with those either side of it, the recommendation is _not_ to adjust the level of significance down.

Most likely I've misunderstood. Could you explain in more detail?
 
What got me thinking about this was reading the abstracts of the studies into homeopathic treatment of flu symptoms you posted on another thread. There were p values of 0.023 and 0.03 quoted, both of which fall in the 5% level of significance.

Yeah, that's a good example of why the distinction made between p and alpha is important. Because even though the p values are less than alpha, the probability that the null hypothesis is false is still very low. Under circumstances where you have no real reason to think that the null hypothesis isn't false (e.g. it's false only 1 percent of the time), most of the time that you get a p value of 0.03 will be when the null hypothesis is true.

This situation is analogous to performing an HIV test with a specificity of 95% (i.e. 95% of true-negatives will have a negative test result) in a population where the incidence of HIV is one in a hundred. For every hundred people tested, the false-positives will outnumber the true-positives 5 to 1.

You mentioned something about adjusting the level of significance down to account for the repeated measurements. From my limited reading on making such adjustments, because each reading would be highly correlated with those either side of it, the recommendation is _not_ to adjust the level of significance down.

Most likely I've misunderstood. Could you explain in more detail?

I think you're thinking of something else. I said that the significance level should be adjusted for multiple comparisons and that multiple variables (well, the word I really used was "stuff", but that's what I meant) had been measured.

If you randomly select two groups of 20 people and measure 20 different characteristics for each group, you have at least a 50% chance that the group means for at least two of those characteristics will be significantly different. It's as though you are giving yourself 20 chances to roll double fives or sixes, instead of just one. One way to maintain an overall level of alpha < 0.05 is to test each comparison at a higher standard (e.g. alpha < 0.0025).

If you have measured a lot of different variables, it is relatively easy to find some (post hoc) which have a p < 0.05 and then list them as your a priori variables of interest when writing up your results.

Linda
 
Yeah, that's a good example of why the distinction made between p and alpha is important. Because even though the p values are less than alpha, the probability that the null hypothesis is false is still very low. Under circumstances where you have no real reason to think that the null hypothesis isn't false (e.g. it's false only 1 percent of the time), most of the time that you get a p value of 0.03 will be when the null hypothesis is true.


I guess you meant, "you have no real reason to think that the null hypothesis is false" ?

Or, without all the confusing multiple negatives, "it's probably true".

If, before an experiment, the probability of null hypothesis H is very high, and the experiment then provides some evidence against H, the probability of H may, after all is said and done, still be fairly high, though it won't be as high as before.

(I know you know this. I'm just repeating what you said in different words, in case it helps anyone else.)
 
I guess you meant, "you have no real reason to think that the null hypothesis is false" ?

Or, without all the confusing multiple negatives, "it's probably true".

If, before an experiment, the probability of null hypothesis H is very high, and the experiment then provides some evidence against H, the probability of H may, after all is said and done, still be fairly high, though it won't be as high as before.

(I know you know this. I'm just repeating what you said in different words, in case it helps anyone else.)

The BMJ article I linked to gives an excellent example of this with numbers.
 

Back
Top Bottom