• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Chronic Myeloid Leukemia: a statistics question

Dave Rogers

Bandaged ice that stampedes inexpensively through
Joined
Jan 29, 2007
Messages
34,741
Location
Cair Paravel, according to XKCD
This is a problem in statistics that's cropped up for me yesterday. I have a feel for the numbers, but I'd like some other opinions just to satisfy my curiosity. This is one for the maths and statistics geeks.

As I've mentioned before, I was diagnosed with chronic myeloid leukemia (CML) in December 2023. It's under control with drugs, though there are some issues with side effects at the moment, and the likelihood is that I'll die with it, not of it. In cancer treatment that's a win.

Very soon after diagnosis, I found that my wife knows someone, also in Ipswich, who was diagnosed with CML a few months before me. I also found out yesterday that a co-worker has a relative who was recently diagnosed with CML. The question that occurred to me is, how likely is it, given the incidence rate, that I should have two people in the same town, with the same disease, diagnosed in the same three-year period, within two degrees of separation of me?

There are some useful numbers available. Incidence of CML is about 850 new cases per year in the UK (numbers from 2017-19 NHS records). UK population is about 66 million, and Ipswich population about 133,000, suggesting about 1.7 average new cases per year in Ipswich. That means there should be 5 people diagnosed with CML in 2023, 2024 and 2025. There are no known causes or associations for CML, only a correlation with age, so it seems to behave like random chance. Degrees of separation is more tricky; I've heard it asserted that most people have about 100 friends and associates, and overlaps mean that about 1000 people are within 2 degrees.

So we have the situation that a sample of order 1000 people, in a town of 133,000 people, contains three out of five randomly selected individuals from that population. We can remove one of those (me) from the statistics, avoiding the Texas Sharpshooter fallacy, but the associations with the other two have no causal connection with CML (these people were within 2 degree of separation before any of us were diagnosed). The expectation, I think, would be a few per cent chance that I'm within 2 degrees of one other CML sufferer, but less than 1% chance of two. So the question is, how unlikely is this situation? And, if it is, does that suggest the CML incidence rates are higher than expected?

Sorry this came out so lengthy, but statistical questions are rarely simple. Thoughts, anyone?

Dave
 
Very soon after diagnosis, I found that my wife knows someone, also in Ipswich, who was diagnosed with CML a few months before me. I also found out yesterday that a co-worker has a relative who was recently diagnosed with CML.
Do you know the identities of these other people? (any chance they could be the same person?)

In any case, my intuition is that it isn't as unlikely as it may seem. It's just a coincidence.
 
Do you know the identities of these other people? (any chance they could be the same person?)
Yes, I know who they are, and they're definitely not the same person.
In any case, my intuition is that it isn't as unlikely as it may seem. It's just a coincidence.
That seems the simplest explanation. I'm just interested in how much of a coincidence.

Dave
 
I think this reduces to estimating the probability that a random sample of 1000 will include two or more cases. Assuming the cases are statistically independent, the probability of having one case within that 1000 is about 1.3%, and the probability of having two cases within that 1000 is about .0165%. Which means that, dividing UK's population at random into 1000-person groups, about 11 of those groups are likely to have two cases. The probability that you would find yourself in one of those 11 groups is about 0.017%. Which is, to within my rounding errors, the same as the probability that any randomly selected group of 1000 people would have two cases, and is of course not just a coincidence.

Events with a probability of 0.017% are rare, obviously, but not at all unheard of. Absent reasons to suspect correlations incompatible with assuming statistical independence, my tentative conclusion would be: it was bad luck. That conclusion is, of course, just a guess.
 
One thing we learn from epidemiology is that diseases cluster along various factors, lots of which we don't really understand. I'll probably have drinks or food with my friend this weekend who is an epidemiologist and knows these statistics models. I'll ask him.
 
I think this reduces to estimating the probability that a random sample of 1000 will include two or more cases. Assuming the cases are statistically independent, the probability of having one case within that 1000 is about 1.3%, and the probability of having two cases within that 1000 is about .0165%. Which means that, dividing UK's population at random into 1000-person groups, about 11 of those groups are likely to have two cases. The probability that you would find yourself in one of those 11 groups is about 0.017%. Which is, to within my rounding errors, the same as the probability that any randomly selected group of 1000 people would have two cases, and is of course not just a coincidence.

Events with a probability of 0.017% are rare, obviously, but not at all unheard of. Absent reasons to suspect correlations incompatible with assuming statistical independence, my tentative conclusion would be: it was bad luck. That conclusion is, of course, just a guess.
Isn't this the very definition of a coincidence? Two or more things coinciding?

There are also lots of low-probability events that don't occur, and of course we don't notice things that don't happen. What I'm trying to say is that if we were to make a complete list of all the low-probability events that may or may not happen in the next year (an impossible task, but for the sake of argument) in advance, it would be a very long list, and almost certainly a few of the things on the list would actually happen. (Do you see what I'm trying to say?)
 
Isn't this the very definition of a coincidence? Two or more things coinciding?
I was referring to the mathematical necessity that two different ways of calculating the same probability will give the same results, to within rounding error. My reason for calculating that probability two different ways was to increase the probability of detecting any mistakes I might have made in the calculation.

For example: the result of multiplying 17 by 2 coincides with the result of adding 17 to 17. In my opinion, it would be misleading to say that's just a coincidence.
 

Back
Top Bottom