The Ganzfeld Experiments

Dancing David said:

Amhearst, we have all discussed with you why we don't find the ganzfeld studies to be proof of anything.


Neither does anyone else, but as evidence, some do.
 
Ersby said:
Let me try again. Imagine you have seven ccommuities and you want to know the average population. The usual thing to do is to add up all the populations and divide by seven. But if you numbered the communities 1 to 7 according to size, would it be correct to take the population of community 4 and say that is the average population?

You see what I'm getting at? The numbers for standardness are just place-holders in a sequence. Like the numbers in the top ten of the charts. They don't mean anything. They have no value of their own. So saying 4 is the average may look sensible, but doesn't mean anything.
You are completely clueless Ersby. Why on earth are you talking about 4 as if the authors had claimed it was the average rating? 4 is the MIDPOINT between 1 and 7, MIDPOINT! The only average rating which was mentioned was 5.33.
They wouldn't have know 5.33 would be the mean, no. But they could have decided beforehand how to judge where the average fell. It's interesting to note that the only time "hypothesized" appears in the paragraph discussing 5.33.

"Decided beforehand how to judge where the average fell"? What decision would need to take place? You find out what the average is by dividing the combined ratings by the number of experiments.
My point is that neither is better than the other. In fact, you could also justifiably talk the average to the top half of the standardness scale and take the average from those. There are any number of "averages" to use. I happen to think the one they've focused on is (a) meaningless and (b) just so happens to give the best results!
Ersby, I really do want to commend you for actually reading and trying to understand articles which supply evidence that contradicts your belief system. It is much more than most so-called skeptics do. But when you misunderstand the material this severely, and it pains me to say this, I think maybe it would have been best if you had never even tried in the first place.

amherst
 
Paul C. Anagnostopoulos said:
But what astounds me is that the paper stops right there. Where is the list of nonstandard protocol aspects? Where is the analysis of the ten most common nonstandard aspects, to see which ones contribute to the failure of the studies? Why did they ignore the standardness of the statistical analysis and the artifact-producing aspects of the protocol? It's enough to make me crazy.

I have emailed Bem to see if he will give me the raters' scoring sheets.

~~ Paul
I don't know why it astounds you that the paper doesn't list the specific nonstandard aspects. As you know, the students were instructed to blindly rate each experiment according to how well its procedure mirrored the standard PRL one. Now unless you're suggesting that the three different raters, all blind to the experimental outcomes, without interacting with each other, and basing their decisions soley on the information they were given by the authors(references to which are given in the article), all somehow falsely rated the standardness of the experiments, you really don't need to know the specifics. All you need to know is that experiments rated as nonstandard had procedures which went against the information the raters used (which again, you can look at) detailing standardness.

amherst
 
posted by Amhearst
I don't know why it astounds you that the paper doesn't list the specific nonstandard aspects.
If you previous rantings and posturings had not revealed you as a poseur, this staement surely would.
1. Because that is a standard research protocol.
2. because then it allows anyone else to judge the ratings and if they are 'blind and objective'.
3. Just to piss off some woo-woo who dearly clings to mistaken beliefs.
4. Because then you can rescale the standards for other measures.

Amhearst, if you have any advanced degrees in science I suggest that you go back to grad school and take a semester of statistics and research methodology. Then I suggest you read up on research journals in social and hard science, to what catch up on what you pretend to be talking about.
Even an undergraduate should be able to understand these question if they have been exposed to research methods.
 
Amherst said:
I don't know why it astounds you that the paper doesn't list the specific nonstandard aspects. As you know, the students were instructed to blindly rate each experiment according to how well its procedure mirrored the standard PRL one. Now unless you're suggesting that the three different raters, all blind to the experimental outcomes, without interacting with each other, and basing their decisions soley on the information they were given by the authors(references to which are given in the article), all somehow falsely rated the standardness of the experiments, you really don't need to know the specifics.
I don't need to know them if all I care about is the simple conclusion of the paper. But since the specifics were recorded, it seems almost silly not to go into them in some detail, no?

All you need to know is that experiments rated as nonstandard had procedures which went against the information the raters used (which again, you can look at) detailing standardness.
Aren't you interested to take this further, in order to find out which specific alterations of the protocol are the ones that ruin the results? That has the potential to uncover what is really going on in these experiments.

"Decided beforehand how to judge where the average fell"? What decision would need to take place? You find out what the average is by dividing the combined ratings by the number of experiments.
But the average is most likely meaningless. Why did the authors even mention the average standardness?

~~ Paul
 
Dancing David said:

Evidence of how much flawed methodology can change results?

That is certainly a possiblity. Another possibility is that something 'psi' related took place. We know that something went on, so we have to experiment further.
 
amherst said:

You are completely clueless Ersby. Why on earth are you talking about 4 as if the authors had claimed it was the average rating? 4 is the MIDPOINT between 1 and 7, MIDPOINT! The only average rating which was mentioned was 5.33.

Ersby, I really do want to commend you for actually reading and trying to understand articles which supply evidence that contradicts your belief system. It is much more than most so-called skeptics do. But when you misunderstand the material this severely, and it pains me to say this, I think maybe it would have been best if you had never even tried in the first place.

amherst

Where do I say 4 is the average rating?

The point about "standardness" is that it is a construct. In fact, it's not even a scale. There's no homogenity amongst the experiments towards the end of the scale, nor is there any information in the "standardness" number itself. If I told you an experiment scored 5.67 as a rating, what could you tell me about it, other than it fell between 6 and 5.33?

Simply repeating the rather obvious statement that 4 falls between 1 and 7 is just demonstrating that you've not given my arguments a fair crack of the whip. I've tried to explain myself with analogys to the Beaufort Scale and what have you, and you've simply got more annoyed! If you could explain why the median should be the point of reference in a little more detail, that'd be good.

As for that last paragraph, may I remind you that you've not done so well yourself in this discussion. What is your opinion of Radin's meta-analysis now?
 
Paul C. Anagnostopoulos said:

Otherwise, they only talk about the midpoint of the scale and define standard and nonstandard accordingly. I think that's okay, since it's just an arbitrary definition. It would be interesting to place the boundary at other points and see what happens.

~~ Paul

As far as I can tell, put it anywhere else and the effect diminishes.
 
Ersby said:
As far as I can tell, put it anywhere else and the effect diminishes.
Now that's interesting. If I raise it to 5, does that push enough standard studies into the nonstandard category to even out the two categories? Likewise, if I lower it to 3, does that drag enough nonstandard studies into the standard category to even them out?

So I fooled around with the cut-off point, calculating average hit rates. If you put it it 3 or 5, it's a wash. If you put it at 3.5, there is still a difference. At 4.5, it's a wash. To get a difference, you have to put it somewhere between Symmons & Morris (1997) and Wezelman & Bierman (1997) series V.

The fact that it abruptly changes at those boundaries makes me suspicious that the entire exercise has no meaning, but is just an artifact of the ordering process. What do other people think?

~~ Paul
 
Taking this quickly (back in the internet café)

If you put the average at the median:4, you get the required effect.

If you put the average at the mean: 5.33, the effect is greatly diminished. In fact the difference between standard and non-standard is 1%

If you put the average at the mode: 6.67, the effect is good, but it relies too much on one experiment: Dalton's investigation into creativity and psi. Without that the avergae hit rate for 6.67 and above is 30% which I don't think is enough to put it within the confidence ratings of previous experiments. Indeed, some statisticians would remove Dalton's work as an outlier, to see how robust the effect is without the most extreme results.

If you split the m-a in two and say the upper half is standard and the lower half is non-standard (which seems most sensible to me) the average hit rate is still at 32% or so, but it still relies on Dalton's experiments too much. (30% without)

But I'm interested in your question as to what makes an experiment fall into one category rather than another. Many experiments were split up according to the protocol they used and it seems to me that quite small differences in protocol makes quite a difference (well, one point of standardness) in where it falls in the meta-analysis.

I should point out that I don't think that there are any experiments that straddle the 4 midpoint. But I'm going to try to list what aspects of what experiments where instrumental in their placing. This'll take a while: maybe tomorrow or Wednesday, and it won't be authoritative, but it may be interesting. For me, if no one else!
 
Does the standardness scale represent a smooth variation among the protocols, with each deviation from standard being more or less as important as the others? Or is it a bumpy scale with a few important deviations at the top and a bunch of trivial ones at the bottom? The answer to these questions should determine the cut-off point between standard and nonstandard protocols, not some arbitrary midpoint.

Of course, the authors say that the standardness is well correlated with the effect size, so maybe that's enough right there.

Let's move on! Where are the lists of deviations?

~~ Paul
 
This is the graph of standardness according to hit rate. Bear in mind I lumped together 3, 2 and 1 becuase they were the smallest of the groupings, and I wanted to bring them up to the same size as the rest of the ratings.

...

(there's supposed to be an image here!)
 
Okay, while I try to work out how to get this machine to do what I want, I'll cut and paste what I came up with earlier.

Let’s call a truce on the whole standardness thing. Either I’ve got it all wrong and it’s so obvious it’s eluding me, or I’m simply not explaining myself correctly, but it seems we’re not getting anywhere. Suffice to say, I find it interesting that if you place the cut-off point of standard to non-standard anywhere else, then the effect diminishes and I wish the paper had explained more fully why 4 was chosen (see my previous point re. Beaufort Scale and why midpoints of scale don’t always correspond to midpoints of effects).

Paul got me thinking about the rating procedure and the advice given to people judging the standardness of the database.

I went back through my database to piece together what I could with regard to the aspects of various experiments would accord/disagree with. If I didn’t know the details about a certain aspect (ie, if static or dynamic targets were used) I’ve not written anything. And if I had no information on them at all (ie, Kanthamani’s work), I’ve left them out completely.


7: standard

Bierman, series I and II, ‘93, novice receivers, mostly meditators, static targets, 25% hit rate
Boughton and Alexander, first timers series 1 and 2, novices, dynamic targets, 21% hit rate
Boughton and Alexander, emotionally close series, sender/receiver friends, dynamic targets, 37.3%
Dalton, ’94, static targets, 41%
Dalton, ’97, creative subjects, 47%
Morris et al, 1993, Study 2, subjects chosen were pro-psi, creative or outgoing, dynamic targets, 40.6%

(It’s interesting to note that Bierman’s lack of success with static targets was considered in agreement with PRL’s results with static targets, yet Dalton’s 1994 results were not considered in conflict with them!)

(Another note, re Broughton and Alexander’s work. The abstract reads “Two series of 50 first-time or novice participants and one series of 51 first-timers defined as emotionally close comprised the replication data set.” Which means that this replication scored a hit rate of 26% as opposed to PRL’s results of 32%. The clairvoyant and pilot series (in 6.67) were considered as adjunct to the main work.)


6.67

Broughton & Alexander, “clairvoyant” series, no sender, dynamic targets, 22%
Broughton & Alexander, pilot series, as PRL, only 8 sessions, 37.5%
Parker, studies 2,3, 4 and 5, dynamic targets, sender could hear receiver speak during session, 120 sessions 40%


6.33

Morris et al, 1993, novice subjects, half static half dynamic targets, sender/receiver friends


5.33

McDonough et al. 1994, artistic subjects, 20 sessions, 30%
Parker, series 1, as for series 2,3,4,5 above except sender could not hear receiver speak, 30 sessions 20%
Williams, mix of sender/no sender/two sender, 11.9%


4.33

Bierman, series 3, dynamic targets, psuedo random choice (all targets appeared equally), 40%
Bierman, series 4a, dynamic targets, 36.1%

(NB: Bierman describes series 3 and 4a and b(below, in 4.00) thusly “The procedure in both series III and IV was nearly identical and conformed globally to the procedure as followed in the PRL Auto ganzfeld series” yet these innocuous experiments have been relegated to 4.33 and 4 in the standardness ratings!)


4.00

Wezelman & Bierman, series 4b, as series 4a (above), no sender, half as “precognition” (no target chosen until after session ended), 15.6%


3.00

Wezelman & Bierman, series 5, subjects as cannabis users, 25%
Wezelman & Bierman, series 6, subjects as cannabis/psilocybine users or meditators, 30%

(Note: Bierman describes series 5 and 6 thusly:” The procedure in V and VI was nearly identical to the one in IV B, with some adaptations. First, white noise and relaxation tape were computerised. Furthermore, though the experimental set-up in series VI was analogeous to that of series IV B and V, series VI was not run in the cellar of the psychology building” so the main difference was the location! This is enough for it to loose one point in standardness!)

(another note: in the meta-analysis the hit rate for series 6 is given as 20%, but this is only possible if they left off the sessions with psilocybine users and meditators. Although meditators were a big failure (1 hit out of 7, despite their inclusion in the rater’s instructions for standardness) psilocybine users did well, although the judging system is a little peculiar. 12 viewers attempted to see 6 targets, and the judges were instructed to give emphasis to those descriptors that overlapped in both viewers notes. I’ve used the most pro-psi figures as a conservative result.)

1.33

Willin, music series, sender, 24.1%

(Note: as far as I can tell, the only difference here between this and the most standard is the target nature.)
 
Okay, I can't get this damned machine to do what I want. Apparently it has no applications that save as jpegs!

I'll try again tomorrow when I'm at home.

Suffice to say, it's not a smooth decline from standard to non-standard.
 
Cannabis and psilocybine users? What, they were trying to find out if drugs make you more psychic? I guess so.

So what happens if you move studies around a bit? For example, move the two Willin studies up to standardness 5. Or move the clairvoyant series down to 5.

Man, I really want to see the rating sheets.

And people have done meta-analyses on these studies, as if they are similar? "Oh sure, we lumped telepathy and clairvoyance together. No one can tell the difference anyway."

~~ Paul
 
Ersby said:
Paul got me thinking about the rating procedure and the advice given to people judging the standardness of the database.


I have been following this thread only intermitently but this caught my eye. Am I to undertand that a variety of raters were used? If so, how was reliability established? Were the same stimuli rated by all the raters? If so, what was the computed reliability? Weirder and weirder.
 
This is the last time I will be adressing this issue. As I suspected, there is no more reasoning to be had with the skeptics in here.

1. On a seven point scale 4 is the midpoint. 4 is the midpoint because there are 3 equal spaces on each side of it. So nonstandard= 1 to 2, 2 to 3, 3 to 4 and standard= 4 to 5, 5 to 6, 6 to 7. If you were to move the midpoint somwhere else such as at 5 (as Paul and Ersby have suggested) there would be one extra space avalible for nonstandard ratings and one less space avaliable for standard. You wouldn't be able to get a correct rating and the MIDPOINT would no longer be MID-POINT.

2. Contrary to what Ersby thinks, each of the places on the seven point scale has a meaning. They'd have to have one in order for the raters to rate the studies in the first place:

1-2= highly non-standard, 2-3= largely more non-standard than standard, 3-4= slightly more non-standard than standard. 4.00=neither standard or non-standard. 4-5=slightly more standard than non-standard, 5-6= largely more standard than non-standard, 6-7= highly standard.

This is the scale the raters used to rate the studies. For someone to suggest that we should change the meaning of the scale points after the ratings have already been completed is completly ludicrous.

Basically, this isn't complicated. I don't expect Ersby or any of the other so-called skeptics in here to understand it though. They have beliefs that they need to uphold.

amherst
 
amherst,

I don't know why it astounds you that the paper doesn't list the specific nonstandard aspects. ... you really don't need to know the specifics. All you need to know is that experiments rated as nonstandard had procedures which went against the information the raters used (which again, you can look at) detailing standardness.
And since the result of this process doesn't produce a simple graph that directly maps 'high hit rates' to 'high standardness' and 'low hit rates' to 'low standardness', then the decisions on what constitutes 'standardness' tells us what?
 

Back
Top Bottom