I'll try to clarify.
If the self-evident results have a set number of choices and/or answers, then you can pretty much calculate The Odds. In in a perfect world these worst case odds indicate how likely a person flipping a coin or rolling dice is to pass the test. Let's call this scenario Blind Luck.
Unfortunately, the world's not perfect. There are going to be occasions where due to time, space, money, personalities, or whatever, the protocol is not going whittle it down to Success by Ability vs Success by Blind Luck. Our odds calculation remains the same, but our confidence level is reduced.
Unless the subject is making their guesses based on a random-like process, such as rolling a die, it has to be assumed that their guesses will not be random, but will follow some sort of pattern. So unless there is no pattern to the placement of the target, the distribution of guesses, based on the pattern of guessing will be different than the distribution of guesses based on random sampling.
We typically see two kinds of guessing games - one where the placement of the target is random, and one where the placement is not random. Connie Sonne's test is an example of the former, and the VFF test is an example of the latter. Just think how easy it would have been to test VFF if the presence or absence of a kidney was determined randomly. The distribution of blind guesses in the former, even though those guesses are not likely to be random, will still correspond to the distribution of guesses based on random sampling. The distribution of guesses in the latter would have to be determined empirically. And that is the stumbling block in the Challenge, because it is not set up to empirically measure the distribution (and thereby provide us with a way to calculate the odds), preferring to depend upon a theoretical distribution.
So what we can try to do, instead, is to make the subject as blind as possible when it comes to the application of any sort of pattern to their guessing. For VFF this means that the subjects are covered in clothing which hides anything that could be used as a clue, such as age. The subjects are presented one at a time, chosen randomly with replacement (which means that some subjects may be read more than once, and some not at all). And she is not told beforehand how many missing kidneys there are. This is what you referred to earlier as Blind Luck vs. Ability.
It sounds to me like you're saying that if we don't have complete confidence that we have made it Ability vs. Blind Luck that we have no right to be discussing odds. Is that your stance?
My stance is that you are using the wrong distribution in order to calculate odds simply because to do otherwise is inconvenient.
I say that this is a challenge and not scientific research. We're perfectly entitled to say, "We acknowledge that in this protocol an ordinary person without a special ability will outperform an ordinary person flipping a coin."
And that is where empirical measurement would be necessary if we want to have some way of quantifying this for comparison to a person with a special ability. Right now, we can guess that both a person without a special ability and a person with a special ability will outperform random sampling, but we don't know by how much. This works against us if we don't make very good guesses as to how much and against the claimant if we way over-compensate. I think my main complaint is that the odds based on random sampling are used as though they are meaningful.
We can do that because the more important statement we're making is, "We're so confident that only a person who has an ability could pass this test that we're gonna put up our money."
Yeah, I'd just like to see some indication that our confidence does not reflect the odds based on random sampling, but rather our certainty that we've made a good guess as to how a person without special abilities would perform.
Linda