In this past week we at the Centre for Trials Research and the department of Primary Care and Public Health at Cardiff University held a journal club to discuss Daniel Lakens excellent blog post “One-sided tests: Efficient and Underused”.
I will try to reproduce the discussions as faithfully as possible, which were not always totally centred on the blog post itself. We explored one-sided tests with reference to our areas of interest: a combination of clinical trials, public health and epidemiology.
There was broad agreement that when you genuinely have a directional hypothesis you should plan for and use a one-sided test, so we had few objections to the blog itself.
Much more discussion was had about whether we ever truly achieve this in medicine. In trials, where patient care is being intervened with, we are usually also interested in whether the effect goes the opposite direction to the expected, broadly agreeing with Bland’s assertion.
In medicine, things do not always work out as expected, and researchers may be surprised by their results. ….Two sided tests should be used unless there is a very good reason for doing otherwise.
We talked about Phase I and II studies where we already use one-sided tests to check whether there is enough signal to warrant progression to a later stage.
There was discussion generally around whether statisticians should be focussing more on capturing and presenting the uncertainty rather than becoming embroiled in the decision making. This would represent a move away from tests of all kinds. We discussed whether we would ideally aim to present the distribution of effects rather than a p-value. This would involve us focussing our efforts on better communicating the uncertainty.
Some agreed with the blog post that one-sided tests are underused in our field and shared examples where in practice the result of a difference in the direction of harm would have had the same result as no evidence of benefit. Something that we definitely dispelled was the impression that one-sided tests are in some way unsavoury or not good practice (with pre-specification).
We discussed an important limitation of the one-sided 95% confidence interval. Since one of the limits of a one-sided CI will be plus or minus infinity, it will include values that are far less consistent with the data than values that lie just outside the interval. This was regarded as an unfortunate property of a one-sided interval.
Another point concerned the one-sided null hypothesis provided in the blog post.
H0: Mean 1 – Mean 2 ≤ 0
This is not a point null hypothesis, and the more common representation of the null hypothesis in one-sided tests is the same as the alternative hypothesis Lakens quotes for the two-sided test, namely
H0: Mean 1 – Mean 2 = 0
There was discussion around whether equipoise (necessary for a clinical trial) naturally led to two-sided testing. If we have genuine uncertainty about which treatment will be more beneficial then it might seem reasonable to translate this through to a two-sided test. Others disagreed and felt that equipoise could mean genuine uncertainty about whether the novel treatment were no worse than usual care (or not) and so naturally lead to a one-sided test.
A practical conclusion we reached was that blog posts are much better for journal club meetings than lengthy manuscripts because you can be sure that everyone will have read it!
Speaking personally I fully expect that I will continue designing studies with planned two-sided tests, but not without considering and enumerating my reasons for that decision each time (in fact, i have spent a half hour today exploring with a chief investigator whether a one-sided test would answer her questions). In the past I was guilty of a level of intellectual laziness when deciding whether to use a one or two sided test. Lakens post provided a useful reminder to question everything.
Present at this discussion were:
Lajos Katona, David Gillespie, Ulugbek Nurmatov, Aideen Naught, Rebecca Playle, Lisa Hurt, Chris Hurt, Jamie White, Daniel Farewell.