Check my methodology - prayer study

saizai · Aug 23, 2006

See http://www.prayermatch.org/ . It should be a complete description - methodology, goals, my intent / opinion, etc.

The backend programming isn't ready yet but the basics (i.e. user accounts and the public pages) are there. I intend to begin once a sufficient number of participants are signed up; the backend will be ready by then.

If you have a critique, please make sure that:
* you've read all the pages linked from the main page
* you can explain why your perceived flaw in my design would cause a false positive result, i.e. a statistically significant difference between the active and control groups of Recipients in the second and/or third round

I am aware that I have put limits on it that may cause false negatives, and am quite okay with that; my problem not yours.

If I have left out anything it is probably by mistake (I only just finished writing the content); point it out and I'll correct it.

BTW, I have previously suggested this as a bona fide MDC, but the understanding I reached with Randi's representative was that they are only interested in things that can be proven in a small-scale, one-person fashion. I do not claim any such power or effect; if there is an effect, I only expect a small but statistically significant difference between the active and control groups.

Thanks!

P.S. Yes, I've read the rules and FAQ.

Flange Desire · Aug 23, 2006

saizai said:
snip
I do not claim any such power or effect; if there is an effect, I only expect a small but statistically significant difference between the active and control groups.

This is just a little unclear to me.

Surely, by definition, an 'effect' is a 'statistically significant difference between the active and control groups'.

What is a little unclear is just how 'small' the effect can be, whilst still being considered 'statistically significant'.

saizai · Aug 23, 2006

FD - It's intended in contrast to the more standard "I can heal anyone I want whenever I want!" sort of claim.

As for size of effect: plot scores from both groups on bell curves. If they're different (p<.05) then it's positive. Very small effects would obviously be harder to find with that amount of certainty (vs if it's a dramatic difference), but again that's my problem and bounded mainly by how many participants I have and need not affect the criteria.

GzuzKryzt · Aug 23, 2006

Saizai, what will happen once you have finished this "study"?
An application seems unlikely because you have already pointed out that the JREF is "only interested in things that can be proven in a small-scale, one-person fashion". With which of "Randi's representatives" did you communicate and when?

My point being: How does your "study" relate to the Challenge since an application from your part seems very unlikely?

GzuzKryzt · Aug 23, 2006

Saizai, from your "Methodology" page: "As the Participants are treated exactly alike by completely automated processes, and there is no direct contact between them and the Experiementers, there is no potential for a violation of blind through that means either."

Since you logically have an interest in a positive result ("a statistically significant difference in the average Scores, in the second and/or third round, between the active and control groups of Recipients") - you do, do you? - you could "disautomate" the "processes" any time you want, right?. I do not imply this being your motive.
The assignment process before each "round" needs to be done by someone other than you. You can't know any data about who is selected to which group at any time during the "study".

Pup · Aug 23, 2006

All Participants will, on a monthly basis, update us on their status. Healers will tell us about their experience, their continuing willigness to participate in the study, etc. Recipients will tell us about their mental and physical health, any unsual events they believe to be related to their participation, etc. This will be submitted online, but a duplicate also signed and mailed to us to keep everything above board.

At the end of the first round, to last at least one year, we will analyze the data we have. Through this, we will try to determine what difference (if any) there is between the two sets of Recipients. On that basis, we will publically choose a numeric equation directly and unambiguously derivable from the data collected, and that equation will serve as the "score" for the second round.

As a simple example, the "score" could consist solely of:

S = Recipient's average reported pain [scale of 1-10]

What are you saying above? It sounds like the recipients will give narrative accounts of their condition and you (someone?) will reduce those accounts to a numerical score. Surely you mean that the recipients themselves will give numerical grades to various things like pain, fatigue, etc. which will be combined into an overall score.

Even if the recipients themselves give the numbers along with anecdotes, it will be tempting to report that the numbers weren't significantly different, but every person who was prayed for told about having a better attitude, or feeling more loved, or having their medical bills miraculous paid for, or some other thing produced by data mining that wasn't anticipated in the numerical scores.

It seems that having the recipients report anything other than the numbers is unnecessary. Though you could collect all the positive anecdotes only from those being prayed for and fund the whole experiment by selling them in a book showing what the power of prayer can do.

saizai · Aug 23, 2006

GK - You are correct that, were I to use the data I have access to and treat participants differently based on their status, that would be a flaw. However, I do not intend to do any such thing. Any questions would be addressed in a general FAQ rather than personal communication with me - or answered by someone who does *not* know the participant's status, but is a delegate of me for other purposes.

I expect that this should be sufficient for a first/second round version. If this were a MDC, then I also expect that it would need to be hosted through a neutral third party or some other mechanism (such as you suggest) for the third round (equivalent to the "final test"; the second round would be the "preliminary test"; the first round is merely investigative) to completely ensure an inability on my part to try to cheat.

I would like it to be a MDC; my understanding is simply that they won't accept this type of challenge though. This is from a discussion by email with Kramer, about a year or two ago.

As for what after - well, that'd depend on the results of the study eh?

Pup said:
What are you saying above? It sounds like the recipients will give narrative accounts of their condition and you (someone?) will reduce those accounts to a numerical score. Surely you mean that the recipients themselves will give numerical grades to various things like pain, fatigue, etc. which will be combined into an overall score.

Of course. All the things you listed would be given in numerical terms using the usual scales for such things (e.g. pain: 1-10; total $ medical bills; total $ paid by you; etc). I intend to use narrative accounts only for annotative purposes, not for analysis proper.

Though you could collect all the positive anecdotes only from those being prayed for and fund the whole experiment by selling them in a book showing what the power of prayer can do.

That might also be an interesting use.

GzuzKryzt · Aug 23, 2006

saizai said:
...
I would like it to be a MDC; my understanding is simply that they won't accept this type of challenge though. This is from a discussion by email witKramer, about a year or two ago.
...

Have you asked the current Challenge Facilitator, Mr. Jeff Wagg challenge@randi.org about it?

Perhaps a shorter "study" (one round) with a more rigid protocol will meet JREF criteria. Since you seem to value the JREF and its mission, you could make the necessary adjustments to have your application accepted.

Religious claims and tests for deities do not qualify for the Challenge. Your "study" suggests the existence of a deity (or deities) since it involves praying, right?
Perhaps you can call it differently. Instead of "praying" you say e.g. "talking". Could you eliminate the alleged involvement of an alleged being and still conduct the "study"?

saizai · Aug 23, 2006

GK - I make no claim whatsoever about the mode of effect, nor that I am testing the existence of any deity. I am not interested in even trying to test that at this point. Just because it is potentially religious in connotation, does not make it a religious claim.

I specifically say that "prayer" is intended to be a generic word of convenience, and that it may take on multiple forms for different people, including ones that are not judeochristian or theological (e.g. buddhist meditation). So no, I won't change that to "talking" (which assumes a very specific view of prayer on your part btw - that of "talking" to the Judeo-Christian God).

A shorter study would not be possible to still be sufficiently rigorous for my taste. The first round vs second/third round split is absolutely necessary for the design as well.

I have already emailed Jeff about this; he has yet to respond to my question about the acceptability of this general type of application. (I.e., one involving multiple people over a long period of time, rather than just me over a short period.)

saizai · Aug 23, 2006

P.S. As for why I post here: Even if I'm not allowed to do the MDC, if there's a flaw in my methodology (though I don't think there is), I'd like to know. Hopefully a bunch of skeptics should be able to find any.

jmercer · Aug 23, 2006

Cool! I always wondered about that.

A murder of crows.
A pride of lions.
A herd of sheep.

And now... finally...

A bunch of skeptics.

My mind is totally at ease due to this, although I'm not quite sure what we have in common with bananas...

Startz · Aug 23, 2006

The real problem with such tests is the possibility of "cheating" - unintentional or intentional. I'll leave that to those with some expertise in the area skeptics/magicians/Amazing, and just comment on the statistics.

First, since you say a number will be assigned to each experimental subject, it would be good to announce in advance what statistical test you're going to use. There are lots of standard tests for the difference between two means. Sometimes they give different results.

Second, while a significance test at the five percent level is certainly standard in science, the informal rule for the Challenge preliminary is 1/1000. Perhaps that would be better?

Third, I don't understand why three rounds are necessary nor the reference to selection bias or data mining. So long as assignment and coding is truly random and all submitted data is used in the same way, one round ought to be sufficient.

saizai · Aug 23, 2006

Startz - Can you suggest how anyone involved could cheat? What would be the (mundane) mechanism thereof?

re #1: I will need to consult statistician friends of mine for that, but since we have until the end of the first round (>1 yr away), I don't think that's a problem. We can at least agree on the basic idea.

re #2: I think 5% should be adequate for the preliminary test. A stricter standard would be acceptable for the final test.

re #3: One round is not sufficient, because the method of Score calculation is done after the first round is complete (to best optimize the effects - I don't claim in advance what exactly will be affected). The second and third rounds are identical; both are run after the Score calculation is made final, so are not affected by the possiblity of selection bias.

If you're not familiar with the term, Wikipedia should have a good article on the subject.

Yoink · Aug 23, 2006

I'm no expert in this stuff, but this looks pretty well designed to me. The only thing I'd say is to second the point that you must specify all the statistical tests you intend to do on the data and exactly what forms the data collection will take before you run the experiment. One can always mine any data set for something out of the ordinary ("we found that if we looked only at those who reported liver problems, a whopping 80% showed marked improvement as compared to the control group!").

The only other thing I will say is that when you get a negative result (yes, I'll take that bet) proponents of healing-through-prayer will simply say "but how can prayers to someone identified only via first name and a number on a website possible get through?" Prayer is one of those things that is probably impossible to test in a rigorous double-blind experiment without doing things that no ethics review board would ever allow (lying to patients about whether or not they were being prayed for, and lying to "pray teams" about whether or not the people they're praying for are really sick etc.

Startz · Aug 23, 2006

saizai said:
Startz - Can you suggest how anyone involved could cheat? What would be the (mundane) mechanism thereof?

re #1: I will need to consult statistician friends of mine for that, but since we have until the end of the first round (>1 yr away), I don't think that's a problem. We can at least agree on the basic idea.

re #2: I think 5% should be adequate for the preliminary test. A stricter standard would be acceptable for the final test.

re #3: One round is not sufficient, because the method of Score calculation is done after the first round is complete (to best optimize the effects - I don't claim in advance what exactly will be affected). The second and third rounds are identical; both are run after the Score calculation is made final, so are not affected by the possiblity of selection bias.

If you're not familiar with the term, Wikipedia should have a good article on the subject.

re 1: I haven't the vaguest idea of how one would cheat. And I certainly didn't mean to imply that you would. I only meant that it's an area where *I* don't know a huge amount.

re 2: I guess that so long as you're not applying for the Challenge, you're not bound by its guidelines. But the JREF standard does seem to be 1/1000.

Here's one reason that's connected to "cheating" (there should be a less perjorative term.) With a five percent standard, someone could set up 14 different web sites, run the tests independently, and expect to find a significant result. Or 14 different people could do the same thing independently. Many people would like to see a a tougher standard.

re 3: You're right. Wikipedia has quite a nice entry. It begins "Selection bias, sometimes referred to as the selection effect, is the error of distorting a statistical analysis due to the methodology of how the samples are collected." So long as your data is collected randomly, there isn't any selection bias by the classic definition.

It sounds like you're trying to guard against bias from choosing the method of analysis. That's probably a good thing to do. But it can be done more simply by letting you see the data and choose a method of analysis *before* you find out which observation is in which group.

If you want to choose a method that finds the largest possible effect, then your suggestion of doing it in the first round and then applying the same method in a subsequent round is sensible. (I'm still not sure why a third round is needed.)

saizai · Aug 23, 2006

Yoink - re p1, that's the reason for the first round. Any such subpopulation choices or measure choices would be made then, before the second/third rounds.

I agree with p2; one can't prove the negative. I try to make it as open as possible though - my ideal would be first name, recent headshot, basic description of the illness, and state/country of residence. I think that should be sufficient for most faiths while still being unidentifiable. (One can also have participants swear that they haven't had any contact with other participants during the course of the study.)

Startz - Agreed re tougher standards; I would leave that to the third round though. Getting p<.001 would probably require a very large sample size - one that could easily be acquired with publicity from a success in the p<.05 second round.

Another possible way to do the analysis is to simply say that any significant effect on any measure tracked (e.g. pain alone, $ spent alone, etc) would count, but that no subpopulation sampling would be allowed. This prevents you from doing what Yoink suggested. After all, *any* significant difference on *any* (numeric) response value would be necessarily paranormal.

Especially if you require a low p-value (enough so that you don't get a false positive simply by virtue of a large number of measures), I think this would be fair.

digithead · Aug 23, 2006

From your faq:

"It is possible - probable even - that recipients will be prayed for by people other than their assigned healers. Couldn't this affect the results?

No. This variable - and all other otherwise uncontrolled variables - is controlled by the process of randomization; there is no reason why it would be different between the control and test groups. It will, however, be tracked - by polling the recipients as to whether they, or people they know of, will be praying for their betterment. It is expected that this measure will be statistically equal between groups, as with other statistics such as average age, gender distribution, etc."

You're wrong, randomization will not always take care of this because you will not know if someone prayed or not until you do the test so you are going to have to test for this rather than dismissing it outright.

What about their own prayers?

What about their current medical treatment and its effectiveness? Certain cancers have better treatments than others.

What about severity of illness? Different types of cancer matter in outcome. Someone with localized skin cancer will always respond better than those in stage 4 pancreatic cancer regardless of whether someone prays for them or not.

Are there any other confounding factors you're missing such as culture, geography, and family history? Any possible interactions?

You really need to sit with a Ph.D. biostatistician to design a better study because it seems like you're going to have to do a much more complicated randomization procedure and statistical analysis (e.g. mixed effects GLM) than you originally intended if you are going to be able to adjust for all the possible covariates, interactions, and covariance structures...

Startz · Aug 23, 2006

digithead said:
From your faq:

"It is possible - probable even - that recipients will be prayed for by people other than their assigned healers. Couldn't this affect the results?

No. This variable - and all other otherwise uncontrolled variables - is controlled by the process of randomization; there is no reason why it would be different between the control and test groups. It will, however, be tracked - by polling the recipients as to whether they, or people they know of, will be praying for their betterment. It is expected that this measure will be statistically equal between groups, as with other statistics such as average age, gender distribution, etc."

You're wrong, randomization will not always take care of this because you will not know if someone prayed or not until you do the test so you are going to have to test for this rather than dismissing it outright.

What about their own prayers?

What about their current medical treatment and its effectiveness? Certain cancers have better treatments than others.

What about severity of illness? Different types of cancer matter in outcome. Someone with localized skin cancer will always respond better than those in stage 4 pancreatic cancer regardless of whether someone prays for them or not.

Are there any other confounding factors you're missing such as culture, geography, and family history? Any possible interactions?

You really need to sit with a Ph.D. biostatistician to design a better study because it seems like you're going to have to do a much more complicated randomization procedure and statistical analysis (e.g. mixed effects GLM) than you originally intended if you are going to be able to adjust for all the possible covariates, interactions, and covariance structures...

I think that so long as prayer is assigned randomly, one can expect the covariates to be distributed randomly across the two groups. In this case, the covariates can be safely ignored.

saizai · Aug 23, 2006

Digithead - What Startz said.

Or to quote myself in the OP:
"If you have a critique, please make sure that:
* you can explain why your perceived flaw in my design would cause a false positive result, i.e. a statistically significant difference between the active and control groups of Recipients in the second and/or third round"

Can you give that explanation?

Dumb All Over · Aug 23, 2006

Hi Sai,

Will you require recipients to forgo all other types of treatments (including medical) during the test period?

Through what method will recipients report? Interview? Questionaire?

Check my methodology - prayer study

Graduate Poster

Muse

Graduate Poster

Philosopher

Philosopher

Philosopher

Graduate Poster

Philosopher

Graduate Poster

Graduate Poster

Penultimate Amazing

Muse

Graduate Poster

Graduate Poster

Muse

Graduate Poster

Thinker

Muse

Graduate Poster

A Little Ugly on the Side