T'ai Chi said:
So... I'm having a hard time understanding why you believe the standard ratings to be flawed I guess.
The problems, as I see it, with the standardness criteria are...
First, it means that the Honorton meta-analysis of 1985 can no longer be compared to the 1999 “standarded” m-a, since the second uses a different criteria for choosing which experiments to include. Thus, when Radin lumps them together for his incomplete overview of ganzfeld work since 1974, he is making a mistake.
Second, “standardness” consists of the autoganzeld protocol set-up as laid down in 1987, and also some other notes according to the PRL results. For example, they suggest considering meditators or artists as “standard” since they got good results in the PRL experiments. But these suggestions seem to have been rather arbitrary. If good results in the PRL database should indicate “standardness”, then dynamic targets should be standard and static targets non-standard. Yet there’s an experiment using static targets at the highest end of the range, so obviously dynamic/static was not given as a criteria for “standardness” which, considering the large gap in results between them in the PRL work, I find odd.
Plus, certain experiments that used the autoganzfeld quite strictly and consider themselves as a replication to the PRL work (such as Bierman’s, see one of my posts above, or maybe the last page, I don’t know!) has been given a very low rating. This is especially so with Willin’s work. The differences seem to be the target nature (music) and the randomisation process (manual shuffling). Other than that it’s a pretty standard ganzfeld set up. Yet it’s rated at a lowly 1.33. This seems somewhat harsh.
Third, having given a new order to the list of ganzfeld experiments, the mid-point of the scale has been set at the median: 4. Now this is, in fact, the only place along the range where the hypothesis regarding standard results to non-standard results is maintained. The mean (5.33) and the mode (6.67) are no good. In fact, if you take the mean as the average and use the full results of Wezelmen and Bierman’s series 6 experiments, the results of standard and non-standard are less than one per cent different. Amherst has insisted that the median is the only worthy average to take, but this hasn’t convinced me. It implies that the standardness rating actually has a numerical value. That standardness 2 is half as standard as standardness 4. Nor are those experiments of the same standard homogenous.
I have a dilemma just now that illustrates the point neatly. On my hard drive I have a collection of stuff about the ganzfeld, but not all. I am missing an experiment by Symmonds and Morris from ‘93. Looking at the m-a I can see it scored 4. What does that tell me? Nothing. I cannot even guess what the protocol of Symmonds’ experiment was by looking at the other experiment that scored 4 (Bierman’s series 5).
Fourthly, I don’t think there’s much of a liner regression of results according to standardness. See my graph of page 10... or 11 [edit: page 12!].
But may I ask you a question, T'ai Chi? You know more about statistics than me, so maybe you can help. One of the effect sizes quoted in the m-a seems peculiar to me. How do you calculate the effect size? In particular, an experiment that lasted 10 sessions, with a 25% hit rate expected by chance, yet got only one hit (10%). What would the effect size of that experiment be?