The Ganzfeld Experiments

Ersby said:

If we are to draw a line in the sand and say that the autoganzfeld work of PRL is the new year zero, then the data looks good, I admit. But not great. The strictest replication of the PRL work came out with chance results.
Which replication are you refering to? In the Bem/Palmer/Broughton paper the Cornell students rated eight studies at 7 (maximum standardness) and of these, half were well above chance( lowest of the four being 37.6). And of the nine studies with the second highest rating, 6.67, all but two were well above chance, (lowest of the 7 being at 36.0.) So this would indicate that the strictest replications get the most highly significant results. Disagree?

amherst
 
Amherst said:
I'll go back to my original question, what more do you need? Even more studies?
Yes! Now researchers need to do studies with variant protocols, in an attempt to figure out the conditions under which ganzfeld works and under which it does not. This is supposed to be science, man, not just some kind of "rah, rah, psi" lovefest.

Why do we spend so much time trying to convince ourselves that it does or doesn't work? Who cares? If this is where we leave it, it'll be dead and buried in 25 years and parapsychologists will have moved on to some other psi du jour, just as they have before. If it works, grab it! Wring it out for everything it's worth. Try to figure out what makes it work and what breaks it. Try to figure out how to get bigger effects, just like other sciences do. Try to come up with a model experiment that any college class can replicate. Try to understand it well enough to make some technology out of it.

~~ Paul
 
I still don't understand why there is so little focus on individuals in the Ganzfeld research. Do some people *always* do better than chance, or don't they? Do some people *always* do better than other receivers, or don't they? The silence on this issue seems to suggest that such 'exceptional/consistent' individuals do not exist. So what sort of effect are we studying if it isn't 'person' based? Why does the Ganzfeld effect exist only in a statistical sense?
 
I think the parapsychology community has an unwritten agreement that they won't focus on individuals. Did a lot of that in the '60s and '70s and just got burned (think Geller).

~~ Paul
 
Loki said:
I still don't understand why there is so little focus on individuals in the Ganzfeld research. Do some people *always* do better than chance, or don't they? Do some people *always* do better than other receivers, or don't they? The silence on this issue seems to suggest that such 'exceptional/consistent' individuals do not exist. So what sort of effect are we studying if it isn't 'person' based? Why does the Ganzfeld effect exist only in a statistical sense?

You have to have a large enough study to detect tiny effects.

Think of the gigantic pools used to try and detect neutrinos.
 
Loki,

I still don't understand why there is so little focus on individuals in the Ganzfeld research. Do some people *always* do better than chance, or don't they? Do some people *always* do better than other receivers, or don't they? The silence on this issue seems to suggest that such 'exceptional/consistent' individuals do not exist. So what sort of effect are we studying if it isn't 'person' based? Why does the Ganzfeld effect exist only in a statistical sense?

Loki you make a great point .. again..

No one ever addresses this because…

1. NO individuals show consistent “high” hits
2. Because they don’t it implies there is just a problem with the process

We all know if these tests TRULY showed some psi effect then it should manifest itself consistently in particular individuals.

Paul,

I think the parapsychology community has an unwritten agreement that they won't focus on individuals. Did a lot of that in the '60s and '70s and just got burned (think Geller).

Which adds to the “don’t focus on THAT.. it clearly shows there is NO effect” myopia of the parapsychology community !
 
Aussie Thinker said:
Loki,



Loki you make a great point .. again..

No one ever addresses this because…

1. NO individuals show consistent “high” hits
2. Because they don’t it implies there is just a problem with the process

We all know if these tests TRULY showed some psi effect then it should manifest itself consistently in particular individuals.

Paul,



Which adds to the “don’t focus on THAT.. it clearly shows there is NO effect” myopia of the parapsychology community !
If you had read the originial Psychological Bulletin paper you'd have known that there have been some highly successful, and very intriguing ganzfeld studies of people who research indicates are more likely to be "psychic" than others:

"The Juilliard sample. There are several reports in the literature of a relationship between creativity or artistic ability and psi performance (Schmeidler, 1988). To explore this possibility in the ganzfeld setting, 10 male and 10 female undergraduates were recruited from the Juilliard School. Of these, 8 were music students, 10 were drama students, and 2 were dance students. Each served as the receiver in a single session in Study 104 or 105. As shown in Table 1, these students achieved a hit rate of 50% (p = .014), one of the five highest hit rates ever reported for a single sample in a ganzfeld study. The musicians were particularly successful: 6 of the 8 (75%) successfully identified their targets (p = .004; further details about this sample and their ganzfeld performance were reported in Schlitz & Honorton, 1992)."

amherst
 
amherst said:

Which replication are you refering to? In the Bem/Palmer/Broughton paper the Cornell students rated eight studies at 7 (maximum standardness) and of these, half were well above chance( lowest of the four being 37.6). And of the nine studies with the second highest rating, 6.67, all but two were well above chance, (lowest of the 7 being at 36.0.) So this would indicate that the strictest replications get the most highly significant results. Disagree?

amherst

I was talking about Broughton and Alexander's "Autoganzfeld II: An attempted replication of the PRL ganzfeld research" that lasted 209 sessions and got chance results.

The problem I have with the Bem m-a is that it took an existing set of data, added more data and then placed a new criteria (new inasmuch as it hadn't been used before in previous meta-analyses) and then tried again. This seems to me like having a second roll of the dice.

As for the scores of the sctrictest replications being highest, look at the figures again. The group of experiments with the best scores are the ones from 4-5. Together they get a hit rate of 41.8% so no, I can't necessarily agree that the strictest adherence to "standardness" gives the best results.

As for your point about the experiments that scored 7, there were still half of those that scored at or below chance. The hit rate for all the experiments that scored 7 was 34%, but this is due almost entirely to the presense of Dalton's work (without this one experiment, the average hit rate for these is 28%). In fact, without Dalton's large scale experiment, the effect of the entire meta-analysis is weakened. And to think you said that including this experiment wouldn't make a difference! I can't put much trust in a replicable effect that relies on just one experiment.

I'll say it again, I think a replication is where someone does the same experiment and gets the same results. A meta-analysis is not a replication. It is used to look for patterns in a large database. It is from a meta-analysis that one proceeds to make replications, not the other way round.
 
amherst,

I'm aware of the "Juilliard Sample" - and it's precisely what I'm referring to. Notice the figures given in your quote :

"10 male and 10 female undergraduates...these students achieved a hit rate of 50% (p = .014)"

"8 were music students, .... The musicians were particularly successful: 6 of the 8 (75%) successfully identified their targets"

Perhaps the most significant term in the quote you provided was (and it's what I'm referring to) :.

"Each served as the receiver in a single session ..."

Hit rates FAR beyond the normal, and yet these people were not tested again? Either as a group, or individually? Why?

Again, since each person was only tested once we have a simple split - they either succeeded, or failed. 0% or 100%. We look at the group (across individuals) to determine a 'hit rate'. Why?
 
Ersby said:
The problem I have with the Bem m-a is that it took an existing set of data, added more data and then placed a new criteria (new inasmuch as it hadn't been used before in previous meta-analyses) and then tried again. This seems to me like having a second roll of the dice.
I don't understand why you have a problem with the meta-analysis having a standardness criteria. Like the authors write in the paper, in order to understand psi, experimenters must be willing to risk replication failures by changing the procedure. And so if you want to do a meta-analysis to see if the original experiments (done with the expressed purpose of demonstrating psi) have been replicated, you need to be sure that the experiments you are grouping together adhere to the standards of that original work.

And I also don't know why you have a problem with the 10 additional studies the authors added to the Milton/Wiseman dataset:

"In addition to the 30 studies analyzed by Milton and Wiseman (1999), an additional 10 studies were located by examining the six major publication outlets for parapsychological research. Many of these studies had been completed but not yet published prior to the cutoff date set by Milton and Wiseman for their meta-analysis."

All this does is make the meta-analysis more inclusive. Unless you're suggesting that they somehow cherry-picked successful studies and excluded unsuccessful ones, I don't understand the problem you have.
As for the scores of the sctrictest replications being highest, look at the figures again. The group of experiments with the best scores are the ones from 4-5. Together they get a hit rate of 41.8% so no, I can't necessarily agree that the strictest adherence to "standardness" gives the best results.
Unless you're seeing something I'm not, the question as to whether standard replications achieved better results than the non-standard experiments really isn't disputable:

"This same outcome can be observed by defining as standard the 29 studies whose ratings fell above the midpoint of the scale (4) and defining as non-standard the 9 studies that fell below the midpoint (2 studies fell at the midpoint): The standard studies obtain an overall hit rate of 31.2%, ES = .096, Stouffer Z = 3.49, p = .0002, one-tailed. In contrast, the non-standard studies obtain an overall hit rate of only 24.0%, ES = -.10, Stouffer Z = -1.30, ns. The difference between the standard and non-standard studies is itself significant, U = 190.5, p = .020, one-tailed. Most importantly, the mean effect size of the standard studies falls within the 95% confidence intervals of both the 39 pre-autoganzfeld studies and the 10 autoganzfeld studies summarized by Bem and Honorton (1994). In other words, ganzfeld studies that adhere to the standard ganzfeld protocol continue to replicate with effect sizes comparable to those of previous studies."
I'll say it again, I think a replication is where someone does the same experiment and gets the same results. A meta-analysis is not a replication. It is used to look for patterns in a large database. It is from a meta-analysis that one proceeds to make replications, not the other way round.

Of course a meta-analysis isn't a replication. In this case it is a grouping of replications. And this grouping reveals that most experiments which adhere to the PRL procedure will get results which are significant.

amherst
 
amherst said:

Unless you're seeing something I'm not, the question as to whether standard replications achieved better results than the non-standard experiments really isn't disputable:

"This same outcome can be observed by defining as standard the 29 studies whose ratings fell above the midpoint of the scale (4) and defining as non-standard the 9 studies that fell below the midpoint (2 studies fell at the midpoint): The standard studies obtain an overall hit rate of 31.2%, ES = .096, Stouffer Z = 3.49, p = .0002, one-tailed. In contrast, the non-standard studies obtain an overall hit rate of only 24.0%, ES = -.10, Stouffer Z = -1.30, ns. The difference between the standard and non-standard studies is itself significant, U = 190.5, p = .020, one-tailed. Most importantly, the mean effect size of the standard studies falls within the 95% confidence intervals of both the 39 pre-autoganzfeld studies and the 10 autoganzfeld studies summarized by Bem and Honorton (1994). In other words, ganzfeld studies that adhere to the standard ganzfeld protocol continue to replicate with effect sizes comparable to those of previous studies."


Of course a meta-analysis isn't a replication. In this case it is a grouping of replications. And this grouping reveals that most experiments which adhere to the PRL procedure will get results which are significant.

amherst

So you have no comment about the fact that studies that scored 4-5 get the best results?

Take another look at the paper you quoted, in particluar the paragraph just before your quote.

The "standardness" ratings of the three raters achieved a Cronbach’s a of .78. The mean of the three sets of ratings on the 7-point scale was 5.33, where higher ratings correspond to greater adherence to the standard ganzfeld protocol. As hypothesized, the degree to which a replication adheres to the standard ganzfeld protocol is positively and significantly correlated with ES, rs(38) = .31, p = .024, one-tailed.

This makes no sense. Doing the sums myself, if you use 5.33 as the point where standard becomes non-standard, then the hit rate is larger in the non-standard half (31.1% compared to 30.4%). In fact, as you demonstrate, the paper goes on to talk in more detail about the difference of using 4 as the point where standardness becomes non-standard. They even talk about the difference in hit rates, which they don’t do for their calculations with 5.33 as the mean.

Which brings me back to that minor point I raised earlier. Why choose 4 as the average? “Standardness” has no numerical value of itself. And why chose 5.33 for that matter? Neither seems better than the other, except they give quite different results. It all goes to demonstrate that standardness is just a construct that has no intrinsic meaning other than an attempt to reorder the data to get the hit rate back up towards thirtysomething percent.

As for adding more studies to the m-a, I've no problem with that. It's the "standardness" criteria which bothers me. It doesn't exist in the first Honorton m-a so the two can't be compared, but then again now we've demonstrated that additional studies reduce the effect size of this dataset to zero, I guess that's by the by.
 
Ersby said:
So you have no comment about the fact that studies that scored 4-5 get the best results?
There were two studies which were rated at 4.00. Since 4.00 is midpoint, these two studies (one at 15.6, the other at 45.1) were not considered standard or non-standard and therefore had no effect on either's combined hit rate. Studies rated over 4.00 are standard, so what's the problem?

Take another look at the paper you quoted, in particluar the paragraph just before your quote.

This makes no sense. Doing the sums myself, if you use 5.33 as the point where standard becomes non-standard, then the hit rate is larger in the non-standard half (31.1% compared to 30.4%). In fact, as you demonstrate, the paper goes on to talk in more detail about the difference of using 4 as the point where standardness becomes non-standard. They even talk about the difference in hit rates, which they don’t do for their calculations with 5.33 as the mean..
Which brings me back to that minor point I raised earlier. Why choose 4 as the average? “Standardness” has no numerical value of itself. And why chose 5.33 for that matter? Neither seems better than the other, except they give quite different results. It all goes to demonstrate that standardness is just a construct that has no intrinsic meaning other than an attempt to reorder the data to get the hit rate back up towards thirtysomething percent.
Below 4 is the point where standard becomes non-standard, and above 4 is the point non-standard becomes standard. This is because an equal amount of places (1,2,3) are non-standard as standard ( 5,6,7.) There is nothing arbitrary about this. 5.33 is the average rating the three raters gave to the 40 experiments. I have no clue as to why you think it should be used as the midpoint for rating standardness.
As for adding more studies to the m-a, I've no problem with that. It's the "standardness" criteria which bothers me. It doesn't exist in the first Honorton m-a so the two can't be compared, but then again now we've demonstrated that additional studies reduce the effect size of this dataset to zero, I guess that's by the by.
First off, I don't know why you think the experiments Bierman mentions in his paper negate the entire original ganzfeld database. Bierman says nothing of the sort. He actually writes that "As argued in the results-section the direct scoring rates however do not invalidate previous meta-analysis."

Second, the original ganzfeld work is not very relevant to the matter at hand. The experimenters are trying to see if the PRL work has been replicated. According to the meta-analysis, it has.

amherst
 
Loki said:
amherst,

I'm aware of the "Juilliard Sample" - and it's precisely what I'm referring to. Notice the figures given in your quote :

"10 male and 10 female undergraduates...these students achieved a hit rate of 50% (p = .014)"

"8 were music students, .... The musicians were particularly successful: 6 of the 8 (75%) successfully identified their targets"

Perhaps the most significant term in the quote you provided was (and it's what I'm referring to) :.

"Each served as the receiver in a single session ..."

Hit rates FAR beyond the normal, and yet these people were not tested again? Either as a group, or individually? Why?
The laboratory was forced to close (due to lack of funding) even before all the original sessions schedueled to be conducted with the students had been completed.

But aside from the ganzfeld, in the field of parapsychology, there has been quite a bit of work done with gifted subjects. For instance, during J.B. Rhine's card tests of 1930's, the dramatic success a Duke University divinity student named Hubert Pearce achieved caused him to be tested many times. Example:

"The classic example of these later experiments is the Pearce-Pratt series, which took place between different buildings on Duke's West Campus. Pratt, the agent, was located in what was then the Physics Building. Once a minute he picked up a card from a precut and preshuffled pack. Without turning it up or looking at it, Pratt moved the card facedown on a book. (Since this experiment was meant to test clairvoyance, it was not necessary for Pratt to see the card.) At that very minute Pearce, located with a sycronized watch in the libary one hundred yards away, tried to perceive the card on the book. Without meeting, both men deposited sealed records with Rhine-Pratt of the targets (which he recorded after the run) and Pearce of his calls-and then met to check results. Although Pearce started off with only chance scores, as was typical of him when confronted with a new situation, he quickly resumed his high scoring level and averaged 9.9 hits per run of 25 (where chance predicts 5 hits) over the 300 trials. Pearce was then moved to the medical school, over 250 yards away, and, after the customary adjustment period, continued his high scoring. Ultimately four separate experiments were done with a total of 558 hits out of 1,850 trials (where 370 would be expected by chance). The odds against chance for the series were literally astronomical, 22 billion-to-one." (Broughton, 91)

All the government ESP work was done with gifted subjects. A man by the name of Patrick Price was probably the most talented. Here's a link to one of the targets Price was asked to view, and the drawing he made when given only the geographical cordinates:
http://www.lfr.org/csl/practical/ops_3.html

But anyway, I do agree that there should be a concerted effort at finding gifted subjects and then running them through the ganzfeld. But in the end it's like anything. Though millions of people can play an instrument, run a mile, whatever, only a few are innantely brilliant at these things. Same probably applies for psychic ability.

amherst
 
Just a comment on the Rhine experiments.

Clairvoyants claim that distance is no object to their viewing. In fact, it's one of their major claims to fame. And yet Rhine moved from just 100 yards to 250 yards away. As if that would make any difference? The test should really have been conducted at, say, 100 miles distance plus. At least PEAR did that with theirs.
 
I made a mistake! Two mistakes, actually. The first is that Beirman's series 4b IS in the meta-analysis. Silly me. Second is that the hit rate for those below 5.33 is actually 29.something. So they DO have have lower hit rate. By one percent!

amherst said:
Below 4 is the point where standard becomes non-standard, and above 4 is the point non-standard becomes standard. This is because an equal amount of places (1,2,3) are non-standard as standard ( 5,6,7.) There is nothing arbitrary about this. 5.33 is the average rating the three raters gave to the 40 experiments. I have no clue as to why you think it should be used as the midpoint for rating standardness.

amherst

If 5.33 didn't mean anything, then why did they mention it?

Putting 4 as the average implies that there is a value attached to the "standardness": that an experiment with standardness 3 is somehow "half as standard" as one with standardness 6, and that it is impossible to have an experiment less standard than 1. This is nonsense.
 
amherst said:

All the government ESP work was done with gifted subjects. A man by the name of Patrick Price was probably the most talented. Here's a link to one of the targets Price was asked to view, and the drawing he made when given only the geographical cordinates:
http://www.lfr.org/csl/practical/ops_3.html

amherst

Oh dear.

And you were doing so well.

The thing about Pat Price being given "only the goegraphical coordinates" is untrue. As you can read here

http://www.gwu.edu/~nsarchiv/NSAEBB/NSAEBB54/st36.pdf

He was given the coordinates, shown the position on a map, and told it was a Russian military base. From the pdf you'll read just how inaccurate Pat Price was. The fact that this is still being touted as some kind of "evidence" depresses me.

Better to stick to the ganzfeld.
 
amherst said:

There were two studies which were rated at 4.00. Since 4.00 is midpoint, these two studies (one at 15.6, the other at 45.1) were not considered standard or non-standard and therefore had no effect on either's combined hit rate. Studies rated over 4.00 are standard, so what's the problem?

amherst

You made a claim that the best results came from the most standard experiments, and you spoke about experiments scoring 7 and 6.67. But this is not the case. That's my problem.
 
amherst said:

First off, I don't know why you think the experiments Bierman mentions in his paper negate the entire original ganzfeld database. Bierman says nothing of the sort. He actually writes that "As argued in the results-section the direct scoring rates however do not invalidate previous meta-analysis."

amherst

*sigh*

And in the very next paragraph he says "However, the point remains that the 17 Ganzfeld experiments reported since the first metaanalysis in 1985 and for which we could infer the effect size that we were able to locate, do conflict with the outcomes reported in that 1985-analysis which incorporated 28 studies
(table II). In fact the effect-sizes do regress to chance expectation as can be seen from the linear regression analysis (figure 1)."
 

Back
Top Bottom