In this study, we address the problem of calculating the power and sample size in the simultaneous assessment of the consistency of treatment effects. The method is based on a general formulation of inconsistency as a treatment-by-subset interaction, while the interaction term is defined as the ratio of the treatment effects. This approach allows the interpretation of inconsistency as a relative change in the treatment effects. In addition, conclusions are given based on an appropriately defined consistency margin. The methodology is applicable to trials with continuous and binary endpoints. Two power definitions arising in multiple testing, namely the all-pair (complete) power and any-pair (minimal) power, are considered. The focus of this study is the assessment of consistency in multi-regional clinical trials, but the proposed methodology is in general applicable to the assessment of any treatment-by-subset interactions, including the detection of qualitative interactions. Several examples from clinical trials are provided to illustrate the application of the proposed procedure. An R add-on package poco was developed for the analysis, which provides the functionality presented in this study.
If interest is in comparing the means of two (normally distributed) samples it is common practise to perform a two-sample t-test and report the corresponding p-value. Nevertheless, it has been widely criticized that the p-value does not provide a measure for the magnitude of the mean effect (e.g., Browne (2010)). This report provides an overview of existing alternatives recently published in the scientific literature that provide a more meaningful measurement of the effect size. Browne (2010) introduced closed form equations to translate a significant t-test p-value and sample size into the probability of one treatment being more successful than another on a per individual basis. This term was afterwards denoted as win probability by Hayter (2013) and he demonstrated the interpretation as 'what would happen if a single future observation were to be taken from either of the two treatments, with attention being directed towards which treatment would win by providing the better value.' In addition Hayter (2013) introduced the corresponding confidence interval as well as the odds of X being greater than Y. He further introduced the transformation into Cohens effect size and the corresponding confidence intervals.