Much of the evidence base for pharmaceutical interventions used in fracture prevention is relatively flimsy, with the results underpinning several key studies barely scraping into statistical significance, it is being argued.
The case was made in a provocative presentation at the recent Australian & New Zealand Bone & Mineral Society ASM, which called for “fragility scoring” to be adopted across the field.
A relatively simple idea, it involves assessing evidence against a ‘Fragility Index’ (FI), defined as the minimum number of patients whose condition would need to change from non-fracture to fracture to invalidate a statistically significant outcome.
In other words, how many additional fractures would have to occur in the intervention arm of a study population before a result’s P-value went above 0.05?
This number is then divided by sample size to provide a “very easy to interpret assessment of robustness” that can be applied across any RCT, said University of Technology, Sydney, researcher Nick Tran.
“Essentially, the bigger the number, the more robust the study,” he said.
He and his collaborators employed the heuristic to examine 22 major RCTs, which had generated 110 statistically significant results on pharmaceutical interventions for fracture prevention including bisphosphonates, denosumab and others published in high-impact journals.
The overall median FI was nine (IQR: 3, 11), indicating that adding as few as nine fracture patients (~0.4% of the study size) to the intervention group would eliminate the previously documented evidence of fracture prevention efficacy.
“Much more alarmingly, we found that for 25% of analyses, there’s an index of three or lower, indicating that by only three events, or 0.2% of the sample size, significance will be lost as well,” he said.
And notably, in 65% of these analyses, the number of participants lost to follow-up exceeded the corresponding FI, casting something of a shadow over significance of the results, the researcher said.
Specifically, the evidence of anti-fracture efficacy of denosumab and calcium/vitamin D supplementation would be lost if only additional four (3, 17) and six (2, 16) patients respectively, sustained a fracture during the follow-up period.
Among 37 positive results (~34%) that used fracture as the primary outcome measure, the evidence for anti-fracture efficacy remained fragile, with a median FI being only 15 (8, 25). Moreover, in approximately 90% of these positive results, the number of participants lost to follow-up was higher than that required to render the results statistically non-significant, the researchers found.
Mr Tran stressed he was not arguing for the research to be abandoned or declared invalid, particularly given clinical experience showed many of the interventions “do in fact work”.
Nevertheless, much of the evidence “was fragile”, and should be evaluated in that light, he said.
“This can have drastic implications as the robustness of the intervention’s efficacy is critically important and an intervention based on fragile evidence could lead to harm,” he said.
“Specifically within our field, we know that early fractures are associated with an increased risk of morbidity and mortality.”
He suggested the field should demand a stricter threshold of statistical significance, possibly to around P<0.01, to enhance the reliability of RCT evidence, as well as better study design and longer durations of follow-up.