• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Check my methodology - prayer study

I have some concerns about the idea of trying to maximize the difference in the first round. Is this a common practice in statistical studies? I'm no statistician, but it seems wrong to me. It assumes that an effect will be seen in the first round, and presumably that the effect will be positive. But what if there is no effect? You would merely be amplifying noise instead of signal. Using the same formula, the second round might indicate a huge positive effect, and the third might indicate a very negative effect, simply because your formula is amplifying random variations. How would you interpret that result?
 
I have some concerns about the idea of trying to maximize the difference in the first round. Is this a common practice in statistical studies? I'm no statistician, but it seems wrong to me. It assumes that an effect will be seen in the first round, and presumably that the effect will be positive. But what if there is no effect? You would merely be amplifying noise instead of signal. Using the same formula, the second round might indicate a huge positive effect, and the third might indicate a very negative effect, simply because your formula is amplifying random variations. How would you interpret that result?

It's not common practice. At least not to be this explicit.
But it's not unfair in this application. Suppose you think that prayer might have an effect, but maybe you suspect that it's not a general effect. It might work if you pray to Zeus. Or maybe to Odin. So you do a first round and find evidence that prayer to Zeus works and that prayer to Odin doesn't.

Now in a complete separate and complete random second round you only check for the efficacy of prayers to Zeus. Folks who believe that neither Zeus nor Odin intervene on request shouldn't object to this procedure.
 
This is obviously spurious, as there is no way to "load up" the groups in such a manner when the assignment is done programatically at random.

It is not spurious...

First, you have to determine what is the scientifically important difference (which is different from statistically significant difference), which your FAQ does not specify.

Then you have to figure out what confounders you might encounter, I've listed some for you but you somehow think simple randomization will miraculously make them disappear which it won't.

Then you have to "guesstimate" a population variance because you don't have pilot data in which to help you. Assume a larger one as this will prevent false positives, if there really is a difference, it will show up.

With this variance and your scientifically important difference, you then estimate the necessary sample size for whichever statistical test you will use based on a certain power and significance level (the FDA uses 80% and 5% respectively).

Then based on that sample size estimate, you have to recruit enough people to your study.

Then you have to determine the proportion of each confounder in the total sample and based on that proportion, you have to do a stratified random sample based on these confounders.

Then you have to verify that your randomization "worked" and each group contains the same proportion of disease, demographics, etc. so that you aren't "spuriously" loading one group in favor of your hypothesis...

Is that clear enough for you?
 
It is not spurious...

First, you have to determine what is the scientifically important difference (which is different from statistically significant difference), which your FAQ does not specify.

Then you have to figure out what confounders you might encounter, I've listed some for you but you somehow think simple randomization will miraculously make them disappear which it won't.

Then you have to "guesstimate" a population variance because you don't have pilot data in which to help you. Assume a larger one as this will prevent false positives, if there really is a difference, it will show up.

With this variance and your scientifically important difference, you then estimate the necessary sample size for whichever statistical test you will use based on a certain power and significance level (the FDA uses 80% and 5% respectively).

Then based on that sample size estimate, you have to recruit enough people to your study.

Then you have to determine the proportion of each confounder in the total sample and based on that proportion, you have to do a stratified random sample based on these confounders.

Then you have to verify that your randomization "worked" and each group contains the same proportion of disease, demographics, etc. so that you aren't "spuriously" loading one group in favor of your hypothesis...

Is that clear enough for you?

Digithead:

The OP has a perfectly well specified null hypothesis (in the second round). He doesn't need to worry about power. All he needs to do is test the difference in sample means. If he can reject the null at a specified size, then he has significant evidence that prayer works.

Of course if prayer has a very small effect, he's going to need a large sample to detect it. So he might be well advised to use the information from the first round along the lines you suggest to decide on a sample size for later rounds.

I would worry much more about how to verify that sampling is truly random than about the difficulties of a statistical test.
 
It's not common practice. At least not to be this explicit.
But it's not unfair in this application. Suppose you think that prayer might have an effect, but maybe you suspect that it's not a general effect. It might work if you pray to Zeus. Or maybe to Odin. So you do a first round and find evidence that prayer to Zeus works and that prayer to Odin doesn't.

Now in a complete separate and complete random second round you only check for the efficacy of prayers to Zeus. Folks who believe that neither Zeus nor Odin intervene on request shouldn't object to this procedure.

That doesn't really address my concern though. What happens if there is actually zero effect, and any variance between the two groups in the first round is therefore just random noise? You'll have no way of knowing that it's just noise, and you will amplify it it as if it is evidence of a positive effect. If it is just random noise though, then won't that amplification itself be random, and therefore meaningless when applied to a second set of data?
 
Digithead:

The OP has a perfectly well specified null hypothesis (in the second round). He doesn't need to worry about power. All he needs to do is test the difference in sample means. If he can reject the null at a specified size, then he has significant evidence that prayer works.

Of course if prayer has a very small effect, he's going to need a large sample to detect it. So he might be well advised to use the information from the first round along the lines you suggest to decide on a sample size for later rounds.

I would worry much more about how to verify that sampling is truly random than about the difficulties of a statistical test.

His hypothesis is that they're different, he doesn't say "how different". So he absolutely needs to tell us what the scientifically important difference is because statistically significant difference is a function of sample size. So lets say the scientifically important difference is 5% and he observes a 1% difference. If he has enough sample size, this will be statistically signficant rather than practically significant and he'll declare success.

Also, specifying the scientifically important difference, variance estimate, and sample size required for his test will allow others to determine if his results are reliable or not and also make is less likely for him to claim that he didn't achieve sufficient power to find the true difference.

It will also force him into thinking about what really constitutes a successful outcome from prayer rather than relying simply on statistical formulas.

And your last statement is what I've been trying to get him to address, that you need to verify that your randomization is truly random and you've tried to account for all of the possible confounders that you think might skew your results.

I'm also assuming that he will be doing a one-sided test as it's only important that prayer improves the treatment over control. What happens if the prayer group does worse? Will he still claim they're statistical significant?
 
His hypothesis is that they're different, he doesn't say "how different". So he absolutely needs to tell us what the scientifically important difference is because statistically significant difference is a function of sample size. So lets say the scientifically important difference is 5% and he observes a 1% difference. If he has enough sample size, this will be statistically signficant rather than practically significant and he'll declare success.

>some stuff snipped
I completely agree that "scientifically important" is what matters. But I think that in this case 1e-308 is "scientifically important." [Of course, detecting an effect of the size would take a *real* large sample. :)]
 
That doesn't really address my concern though. What happens if there is actually zero effect, and any variance between the two groups in the first round is therefore just random noise? You'll have no way of knowing that it's just noise, and you will amplify it it as if it is evidence of a positive effect. If it is just random noise though, then won't that amplification itself be random, and therefore meaningless when applied to a second set of data?

You're quite right. The effect will be nothing significant will be found in the second set of data. That's why it's harmless to allow it.
 
Plasmadog - Amplifying noise to perceived signal only works one time. If it is indeed noise, then the next time it will be just noise and any pattern you thought was there ought to evaporate. You agree with this in your second post about it.

Startz - I agree with pretty much everything you've said.

I admire your optimism about the frequency of "data dredging" in the wider scientific literature. ;)

I reserve the right to add additional selection criteria for eligible Healers *and* Recipients before the commencement of the second and third rounds. If I find in the first round that, e.g., God hates atheists when it comes to whether they get better when prayed for, then I won't allow atheists as Recipients. :p Or if as you suggest, Odin is deaf as well as blind, then no Odin-worshippers as Healers.

As you say, all the data mining / crossreference techniques that digithead has suggested would be useful for me to determine the optimum measure for the second and third rounds. And I fully agree that the effect size I can detect will be inversely proportional to the sample size.

All that says is that I will not be able to detect effects less than some particular size to a sufficient significance. That is acceptable to me, and inherent in the design.

Digithead - What I objected to about your statement was "loading". That is an active, volitional verb that implies that I would somehow be skewing the population of the active vs control groups.

The assignment would be done by a simple mechanism, e.g. the result of (rand(time()) % 2).
 
Last edited:
Also:
What happens if the prayer group does worse? Will he still claim they're statistical significant?

If they do worse by a sufficiently statistically significant amount: Yes. Though that would have rather different social consequences. ;)

Also, I make absolutely no a priori claim as to the magnitude of the effect. An effect of any magnitude, to within the significance desired (p<.001) would count as a positive result. Naturally, detecting a minute difference would require an impracticable number of participants, but that is the only constraint.
 
I think that so long as prayer is assigned randomly, one can expect the covariates to be distributed randomly across the two groups. In this case, the covariates can be safely ignored.
I agree with digithead. You have a poor control group here and the sample bias is going to negate the validity of your results.

In order to eliminate sample bias you have to have a more random group of cancer victims. In this study the subjects are selecting themselves. You don't have a random sample, period. "Loading" has nothing to do with it.

The people who select themselves are very likely going to include more believers than non-believers. Unless you have a study showing that isn't the case, then I'll stick with that assumption. Show me otherwise and I'll change my mind.

Believers are more likely to have other people praying for them. So your control group will also likely be prayed for. Again, show me some evidence that isn't true and I'll change my mind.

But, all that aside, other studies have failed to show any effect of prayer on illness especially cancer outcomes so I doubt you'll see any significant results anyway.
 
Last edited:
skeptigirl: I fully cede and agree to the point of participants being a self-selected, non-random subset of the general population of cancer victims.

However, I need to ask you to explain how that could possibly create a false positive difference between the active and control groups, since both are drawn from the same (admittedly self-selected) pool and assigned randomly (from that pool).

I also entirely agree that the control group will likely be prayed for, and challenge you to the same question on this point as well.

Your doubt is not an argument. :)
 
Draft JREF M$C protocol for PrayerMatch experiment

Protocol:
SETUP
1. Two pools of people will be recruited by Claimant to participate in the study. These people will be unrelated to Claimant and of either of two kinds: Recipients or Healers. Both are termed Partcipiants.

2. All Participants will have agreed in writing and electronic medium to the study design, affirmed their ability to give consent, agreed to provide signed copies of all their data submitted online, and sworn not to attempt to contact any other Participant until the conclusion of the round(s) in which they participate. All Participants will also fill out a brief survey. Recipients will sign a form as part of their initial response, a copy to be kept by their doctor, authorizing and requesting the doctor to notify us in the event of their death. **

3. Healers will be people of various faiths who have agreed to their duties as described here. Recipients will be people currently diagnosed with cancer as of the beginning of the round in which they participate. No Recipient may participate in more than one round. Healers may do so at their discretion.

4. Claimant reserves the right to add additional criteria for participation by Healers or Recipients or both, at Claimant's sole discretion, prior to the commencement of the second round. Such criteria may include, for example, inclusion or exclusion of particular faiths, ages, genders, or other tracked data. The criteria, once set, will be identical for rounds two and three. *

5. The study is divided into three rounds, each lasting approximately but no less than one year.

ROUNDS
6. Each round of the study will commence as soon as there are a sufficient number of Participants, N(R) and N(H) respectively. *

7. Upon commencement of a round, all participating Recipients will be randomly assigned by computer to either the Active or Control groups with a 50-50 chance of each. No Participant, nor any Experimenter who has potential contact with any Participant, will be allowed to have access to this data until after the Round is complete. This ensures that the experiment is a double blind. *

8. Once a month, each Healer will be randomly assigned a Recipient from the Active group to pray for. They will be reminded to do so once a week, and commit to doing so at least five minute per week every week for the duration participation. They may pray, and interpret the meaning of the word 'pray', in whatever manner their faith indicates is appropriate.

9. Healers will be told the first name, state, country, primary cancer location, and cancer type of their assigned Recipient and only their assigned Recipient. They will also be shown a digital photo of the Recipient, uploaded by the same, if available.

10. Each Recipient may have more than one Healer assigned to them at a time; indeed, that is the intent. However, each Healer will only be assigned one Recipient at at time. That is, there is a one-to-many relationship between Recipients and Healers. This will be, on average, N(H)/N(R).

11. Once per month, each Participant will provide an update in the form of a brief survey composed of multiple choice or numeric questions and one freeform text response. The text will be used for annotative and illustrative purposes only. **

12. Once per quarter, each Recipient will provide an update from their doctor, composed likewise. **

ANALYSIS AFTER FIRST ROUND

13. Extant data will be analyzed after the first round and used to create a Score Equation. The Score Equation will be an unambiguously determinable equation resulting from the data collected, and provided in the form of a computer function that, with access to any particular Recipient's records and without access to information regarding which group (active or control) they were assigned to, outputs a real number between 0 and 100 inclusive.

14. This Score Equation will be set at the Claimant's sole discretion in advisement with statisticians of Claimant's choosing. Once set, it will not be changed for either the second or third rounds.

COMPLETION OF STUDY

15. The second round will be used as the "preliminary test" for JREF purposes. The third round will be the "final test". No difference in the Score Equation, participation criteria, or significance test will be permitted between second and third rounds.

16. The second and third rounds will proceed identically to the first as stated above, except for the addition of predetermined Score Equation and participation criteria (if any).

POSITIVE AND NEGATIVE RESULTS

17. A positive result is obtained when the Scores, as determined by the Score Equation, differ significantly the Active and Control groups of Recipients.

18. "Differs significantly" is defined as any magnitude difference, whether positive or negative, with a statistical significance of p < .001. This shall not be changed between the second and third rounds.

19. A negative result is anything else.

20. Any Recipient who can be clearly demonstrated to have communicated with any other Participant, or who refuses to sign a statement saying that they have not, will have their data removed from the analysis for all purposes above.

WHAT CLAIMANT DOES NOT CLAIM AND WILL NOT ATTEMPT TO PROVE OR DISPROVE
* the existence of any deity
* what the mechanism of prayer is, if it works, or anything based on any particular theory of how prayer works
* whether any particular Recipient has been prayed for or not, or how much
* the ability to heal any particular Recipient
* any personal paranormal ability
* the effectiveness of any particular treatment other than the prayer gotten through being in the active group


* Items to be specified later:

1. N(R) and N(H): at conclusion of the first round, at Claimant's sole discretion
2. Additional criteria for participation: ditto
3. Specific computer security protocols: ditto, by mutual agreement between Claimant and a computer security professional of JREF's choosing. Security protocols for the first round will be at Claimant's sole discretion.
4. Score Equation: Ditto

** Data collected (may be revised at Claimant's sole discretion prior to second round):

General:
Name
Gender
Date of birth
Religion (eg Christian)
# days per month engaged in religious activity
# years practicing said religion
Ethnicity
Income
Country
State
Picture, self-submitted .jpg or convertible to .jpg
Belief in the efficacy of prayer, self-rated scale 1-10 (10 = complete faith)
# times per month praying for self
# people known to participant to be praying for participant
# times per month on average said people do so
# minutes on average participant is engaged in any one "prayer"
Preferred praying style, eg directed/undirected, alone/group, etc (choice from list)
Introversion-extroversion, self-rated scale 1-7 (1 = introvert 7 = extrovert)
# years practicing as professional faith healer, remote healer, reiki user, psychic, etc., if any

For Recipients monthly:
# days in medical treatment
Self-reported average pain (0-10, 0 = none)
Self-reported average quality of life (1-10, 10 = excellent)
# days taking cancer medication
$ cost of treatment
$ cost of treatment paid by recipient
Still alive? (t/f)

From doctor once:
Name
Institution name
License #
Phone #
Address
# years practicing medicine
# years practicing with cancer specifically

From doctor quarterly, patient's:
Current 95%-threshold life expectancy in months
Degree of metastasis, 1-10
Primary location of cancer
Type of cancer
% likelihood to recover from cancer into long-term stable situation, but cancer still present
% likelihood to go into full remission, i.e. cancer not detectable present
 
Last edited:
One thing I forgot: add a clause for having Recipients sign a statement asking their doctor to notify us in the event of their death, and leaving a copy of it with their doctor.

Also, doctor's name, institution, license #, and contact info will be asked, and the participant's physical address as well.
 
And one more:

20. Any Recipient who can be clearly demonstrated to have communicated with any other Participant, or who refuses to sign a statement saying that they have not, will have their data removed from the analysis for all purposes above.
 
I agree with digithead and skeptigirl here. Randomly selecting people from a small sample does not give an even distribution. For example, say you have 50 people with mild cancer and 50 people with severe cancer. You are very unlikely to get 25 of each in each group if they are randomly selected. If there is no effect from prayer, or only a small effect, this could still give a strong positive result. In this example, if 30 people with only mild cancer were asigned to the prayer group then this group would have much better results.

You definately need to show that the control and test groups have similar demographics before analysing the results, and preferably before the trial starts.
 
He has been invited to apply. The protocol will need a lot of work, but I'll not be discussing that until after an application is received.
 
Digithead - What I objected to about your statement was "loading". That is an active, volitional verb that implies that I would somehow be skewing the population of the active vs control groups.
The assignment would be done by a simple mechanism, e.g. the result of (rand(time()) % 2).

But that is exactly what you are doing by not accounting for the confounders of disease level, demographics, frailty, etc...

And it does not have to be volitional but mere unintentional if you are not aware that you are doing it. For instance, if I load fruit onto a plane for distribution and unknowingly, an insect happens to be in some of the fruit, I am introducing something that potentionally might ruin the fruit or harm the consumer...

However, If I am aware of these insects and inspect the fruit before loading (placing) them on the plane, I stand a better chance of preventing the harm from occurring...

But if I've been made aware of this problem and dismiss it as inconsequential (which it seems you are doing by all your rhetoric) then it's willful blindness...

As for your practical signficance=statistical significance=0.001. That's just silly, because then your results are solely based on sample size and not on a real scientific hypothesis placed within a theoretical construct. Irrespective of your hypothesis, If I were a reviewer and you were looking for funding to do this type study design, I'd place you on the denied pile because you're not even close to accounting for all of the sampling problems and confounders that will occur...

Seriously, seek the assistance of Ph.D. biostatistician with clinical trial experience, they can help you if you really want a decent study design...
 
Saizai:

Let me suggest three modifications to your protocol, inspired by comments of others on the board (notably DigitHead).

1. Recipients will be assigned two random code numbers, A and B. Reciepients will be told code A and use it to report in. Healers will be told code B. In the files maintained in the database, pictures, names, etc. will be kept together with the code numbers in a form believed to be accessible by Higher Powers.

(This brings the experiment closer to double-blind, reducing the possibility of intentional or unintentional collusion.)

2. The scoring method used will be defined in a mechanical way. In other words, it will meet JREF's "no judgment" criteria.

3. Rather than have Round 1 be part of the Challenge, use it as a pre-test. Otherwise, JREF is agreeing to use of a test in the second and third rounds that you get to define unilaterally.

Would something like these be acceptable?
 
digithead: I claim no a priori theory about the mechanism or effect of prayer. The first round is inteded to find out (or tune) the latter; I will not attempt to look at the former. We are not going to discuss theory, just as you would not discuss it with a dowser.

Your insects analogy is inaccurate.

Startz: 1. This is unnecessary; a more traditional login-and-password scheme is much simpler and just as effective. They will be also assigned a random code number for the purposes of their printouts, to make hand-tallying those easier. Possibly a similar mechanism to allow doctors to report deaths.

2. Of course. I elect as the form of my definition a chunk of code to be run on the server.

3. It is a pre-test; you can consider it part of the negotiation. I consider the terms that I reserve for myself to define unilaterally before commencement to be completely acceptable to a skeptic's needs for the protocol no matter what I define them to be.
 

Back
Top Bottom