In 1925, British statistician, Ronald Aylmer Fisher published the book Statistical Methods for Research Workers. It became the bible for scientific study at the time and included the proposal that if there is less than 1 in 20 chance (P < .05) that the difference between 2 numbers is due to chance, then it can be proposed that the difference is statistically significant.
90 plus years later, that P < .05 concept is considered the gold standard for a clinical trial. But is it really a good benchmark?
In clinical trials, P < .05 is often cited as the difference between 2 results being statistically significant. As a treatment is being developed from phase 1 to phase 2 and phase 3 trials, researchers are always cognizant if that P < .05 threshold is being met. If a phase 2 study does not show a P < .05 treatment effect on efficacy, the odds of a phase 3 study beginning are small. And if a phase 3 study does not show a P < .05 effect on efficacy, the company’s stock prices will plummet since the odds of that drug gaining approval by Health Canada, the FDA, the EU, and elsewhere are extremely small.
A P < .05 value is actually a fairly low bar to jump over. A more desirable result in a clinical trial would be P < .01 or P < .005. There are multiple reasons for that. Largely because the P < .05 still means there is a 1 in 20 chance that the results are due to luck. That is fairly high. A recent editorial in JAMA proposed that a P < .05 result should be labeled as being ‘statistically suggestive’ rather than ‘statistically significant’. Also, given that most new treatments are going to be more expensive that current treatment options, having a new treatment that is at least 99.5% more likely to be effective than the current options (i.e., P < .005), is something regulatory bodies want to see. But if the new treatment is just 95% more likely to be better (i.e., P < .05), that is not as appealing.
Finally, it should be pointed out that the P value is not only used to look at the efficacy of the drug but also at the group of patients in the study. In smaller studies, such as those used for rare diseases, it is not uncommon to find some of the baseline characteristics being statistically different (at least at the P < .05). Again, given the difficult decisions that regulatory bodies must make in deciding if a treatment should be approved or not (largely at the expense of the government and health insurance industry), allowing them to question the results of a study due to bad luck in dividing the patient groups can only make it difficult later on with other regulatory bodies.
So, if you see a phase 3 confirmatory clinical trial in which the differences in baseline characteristics of the treatment groups are ‘statistically significant’ or if the primary outcome measure is closer to P < .05 than P < .005, you need to be concerned. That treatment, even if approved by Health Canada, may have problems getting past CADTH and the provincial ministries.