Volume 72, Issue 1
Original Article
Free Access

Evidence for non‐random sampling in randomised, controlled trials by Yuhji Saitoh

J. B. Carlisle

Corresponding Author

Consultant

E-mail address: john.carlisle@nhs.net

Department of Anaesthesia, Peri‐operative Medicine and Intensive Care, Torbay Hospital, Torquay, UK

Correspondence to: J. B. Carlisle

Email: john.carlisle@nhs.net

Search for more papers by this author
J. A. Loadsman

Senior Staff Specialist, Conjoint Associate Professor

Royal Prince Alfred Hospital, Camperdown, New South Wales, Australia

Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia

Search for more papers by this author
First published: 18 December 2016
Citations: 27

You can respond to this article at http://www.anaesthesiacorrespondence.com

Summary

A large number of randomised trials authored by Yoshitaka Fujii have been retracted, in part as a consequence of a previous analysis finding a very low probability of random sampling. Dr Yuhji Saitoh co‐authored 34 of those trials and he was corresponding author for eight of them. We found a number of additional randomised, controlled trials that included baseline data, with Saitoh as corresponding author, that Fujii did not co‐author. We used Monte Carlo simulations to analyse the baseline data from 32 relevant trials in total as well as an outcome (muscle twitch recovery ratios) reported in several. We also compared a series of muscle twitch recovery graphs appearing in a number of Saitoh's publications. The baseline data in 14/32 randomised, controlled trials had p < 0.01, of which seven p values were < 0.001. Eight trials reported four ratios of the time for the return of muscle activity after neuromuscular blockade, the distributions of which were homogeneous: the p values for the observed Q statistics were 0.0055, 0.031, 0.016 and 0.0071. Comparison of graphs revealed multiple coincident or near‐coincident curves across a large number of publications, a finding also inconsistent with random sampling. Combining the continuous and categorical probabilities of the 32 included trials, we found a very low likelihood of random sampling: p = 1.27 × 10−8 (1 in 100,000,000). The high probability of non‐random sampling and the repetition of lines in multiple graphs suggest that further scrutiny of Saitoh's work is warranted.

Introduction

In 2006, an analysis of homogeneity in meta‐analyses identified a very extreme degree of between‐study homogeneity in five studies published by Joachim Boldt 1. Suspicions raised by readers of a 2009 publication subsequently led to an institutional investigation and ultimately the retraction of more than 90 of Boldt's published studies for lack of ethics approval and fabrication of data 2.

In 2012, a similar analysis of the baseline variables in a large number of studies published by Yoshitaka Fujii found a very low probability of random sampling 3. This evidence formed an important part of the request that prompted a multi‐institutional investigation of Fujii's publications, ultimately leading to the recommendation that over 180 papers should be retracted, again for lack of ethics approval and fabrication 4. The methods used for the analysis have subsequently been refined 5. One of Fujii's co‐authors on 34 of those retracted papers was Dr Yuhji Saitoh, who was first and corresponding author on eight of these trials.

Following concerns raised over a new submission to the journal Anaesthesia and Intensive Care, we undertook a more focused analysis of data in randomised, controlled trials with Dr Yuhji Saitoh as an author.

Methods

In 2013, randomised, controlled trials published in six anaesthesia journals (2002–2012) were surveyed (unpublished). The distributions of mean (SD) for baseline variables were analysed using a published method 3. Additional studies for authors of at least two trials for which p < 0.05 were retrieved. The analyses were repeated using Monte Carlo simulations, which is a more reliable method than that used for Fujii (as described in the June 2015 issue of Anaesthesia). Monte Carlo simulations were also used for baseline categorical variables. The method used to analyse baseline continuous variables has been described in detail 3, 5. In summary, Monte Carlo simulations were used instead of an independent t‐test or ANOVA to generate a p value for differences between means. The aspect of interest is the probability that the difference in means would be less than reported (the left‐hand tail of the distribution), which is equal to (1 − p)/2 where ‘p’ is the p value generated by a two‐sided t‐test or ANOVA. However, parametric tests of summary data generate p = 1 when the means are the same, as if they were identical to an infinite number of decimal places. Monte Carlo simulations are needed when the precision of means is insufficient to discriminate their differences. Monte Carlo simulations were also used for categorical variables and Stouffer's method to combine p values for continuous variables, categorical variables and all baseline variables. The Kolmogorov–Smirnov test was used, against a uniform distribution, for the p values of variables and randomised, controlled trials. The homogeneity of the standardised mean differences in the twitch recovery times (in a train‐of‐four) in the relevant studies was also analysed using Monte Carlo simulations for the Q statistic, as well as for the tau statistic and effect size probability. This type of analysis was used to identify the unusual homogeneity in the results of Boldt et al. in 2006 1. The code used to program the Monte Carlo simulations is available as an online appendix (Appendix S1).

By December 2013, 11 trials with baseline data published by Saitoh and co‐authors had been so detected and analysed: 6/11 had unlikely distributions of baseline data, and it was noticed additionally that at least two others shared graph lines in common, which seemed unlikely, so a wider comparison of graphs in Saitoh's publications was also undertaken. Graphs were copied and transparently pasted on top of other graphs.

An additional six trials with baseline data that Saitoh had co‐authored with Fujii had previously been analysed, 3 but the association was not recognised at the time. After the submission to Anaesthesia and Intensive Care raised concerns, the Y Saitohs were identified as the same individual. A number of additional published trials and one unpublished trial (the paper submitted to Anaesthesia and Intensive Care) could then be added to the analysis. All analyses were conducted in R 6.

Results

In addition to the unpublished trial submitted to Anaesthesia and Intensive Care, we retrieved 40 studies with Yuhji Saitoh as an author and for which Yoshitaka Fujii was not corresponding author (Appendix S2 [1–40]); in all we analysed baseline continuous data in 32 randomised, controlled trials (Appendix S2 [1–32]). Dr Saitoh was corresponding author for 26 of these trials (Appendix S2 [1–17, 19–21, 23–25, 30–32]), six of which have been retracted (Appendix S2 [12–17]) and one rejected before publication (Appendix S2 [32]). A further two randomised, controlled trials with Dr Saitoh as corresponding author that did not present baseline data have been retracted (Appendix S2 [33, 34]).

The baseline variables of 14/32 trials had combined p < 0.01 (one right‐hand p value), of which seven were < 0.001 (Table 1 and Fig. 1). These p values are for the distribution of baseline means and rates and are less extreme than those calculated for 158 randomised, controlled trials with Yoshitaka Fujii as author (Fig. 2). The probability for distributions of standard deviations and their associated means can also be calculated. For example, both means and standard deviations are proximate in Fig. 3 of reference (Appendix S2 [24]), reproduced (with permission) in Fig. 3. The probability that a similar table would contain mean (SD) combinations as or more similar than reported was 0.0000089, determined in 100 million Monte Carlo simulations.

Table 1. The probabilities that simple random sampling would result in groups as similar as reported for: means (continuous variables); rates (categorical variables); continuous and categorical probabilities combined. Reference numbers are as listed in online Appendix S2
Reference Appendix S2 Year Journal Volume 1st page Corresponding author p value for baseline variables
Continuous Categorical Combined
1 1993 BJA 70 402 Saitoh 0.046 0.013 0.0037
2 1995 BJA 74 293 Saitoh 0.18 0.072 0.055
3 1995 CJA 42 992 Saitoh 0.40 0.00037 0.026
4 1995 CJA 42 1096aa Investigated, not retracted.
Saitoh 0.0096 0.0021 0.00027
5 1996 CJA 43 362 Saitoh 0.15 0.0021 0.0086
6 1997 AA 84 1354 Saitoh 0.45 0.0055 0.080
7 1997 AAS 41 741 Saitoh 0.14 0.013 0.019
8 1997 CJA 44 390 Saitoh 0.16 0.0011 0.0089
9 1997 EJA 14 327 Saitoh 0.98 0.00082 0.59
10 1998 AAS 42 851 Saitoh 0.22 0.0045 0.051
11 1998 An 53 244aa Investigated, not retracted.
Saitoh 0.0072 0.0023 0.0015
12 1998 EJA 15 524 Saitoh 0.013 0.0000015 0.0000097
13 1998 EJA 15 649 Saitoh 0.68 0.084 0.39
14 1999 BJA 82 329bb Investigated, retracted.
Saitoh 0.015 0.012 0.0011
15 1999 BJA 83 275bb Investigated, retracted.
Saitoh 0.13 0.0060 0.0093
16 1999 AA 89 1565bb Investigated, retracted.
Saitoh 0.11 0.0022 0.0068
17 2001 CJA 48 28bb Investigated, retracted.
Saitoh 0.028 0.0021 0.00090
18 2001 AA 93 1214 Oshima 0.57 0.049 0.34
19 2001 BJA 86 814 Saitoh 0.70 0.0022 0.046
20 2002 JA 16 102 Saitoh 0.61 0.20 0.47
21 2002 An 57 218 Saitoh 0.78 0.0045 0.34
22 2003 An 58 643 Nakajima 0.68
23 2003 BJA 90 480 Saitoh 0.019 0.073 0.00028
24 2003 CJA 50 342 Saitoh 0.0027 0.021 0.00012
25 2005 CJA 52 467 Saitoh 0.0.095 0.067 0.29
26 2005 JCA 17 276 Hattori 0.94 0.39 0.89
27 2005 EJA 22 20 Hattori 0.16 0.073 0.048
28 2007 FJMS 53 61 Katayama 0.81
29 2010 JA 24 168 Oshima 0.9999986 0.22 0.999961
30 2010 JCA 22 318 Saitoh 0.95
31 2012 JA 26 28 Saitoh 0.57 0.66 0.64
32 2015 AIC Unpublished Saitoh 0.0018 0.060 0.0010
  • AA, Anesthesia and Analgesia; AAS, Acta Anaesthesiologica Scandinavica; AIC, Anaesthesia and Intensive Care; An, Anaesthesia; BJA, British Journal of Anaesthesia; CJA, Canadian Journal of Anesthesia; EJA, European Journal of Anaesthesiology; FJMS, Fukushima Journal of Medical Sciences; JA, Journal of Anesthesia; JCA, Journal of Clinical Anesthesia.
  • a Investigated, not retracted.
  • b Investigated, retracted.
image
The cumulative distribution of p values for the means of 116 continuous variables from 32 randomised, controlled trials with Yuhji Saitoh as first and corresponding author. The distribution of p values was inconsistent with simple random sampling, p = 0.0011.
image
The cumulative distribution of p values for the combined means of 32 randomised, controlled trials with Yuhji Saitoh as first and corresponding author (black markers). The distribution of p values was inconsistent with simple random sampling, p = 0.0023. The distribution of equivalent p values from 150 randomised, controlled trials with Yoshitaka Fujii as corresponding author (red markers), which were less consistent with simple random sampling, p < 2 × 10−16.
image
The distribution of means were analysed for all papers by Dr Yuhji Saitoh. This particular table was also analysed to determine the probability that random sampling would result in the distribution of standard deviations in association with their means, p = 0.0000089 [Appendix S2, 24]. Reproduced with permission.
image
In 2006 results in Boldt et al.'s papers were shown to lack the variability expected due to chance 1. This figure illustrates the same technique, applied to ratios of time taken for twitch numbers (1, 2, 3 or 4) to recover in eight randomised, controlled trials with Saitoh as corresponding author. The probabilities for the lack of heterogeneity were 0.0055 (T1), 0.031 (T2), 0.016 (T3) and 0.0071 (T4).

Eight papers (Appendix S2 [19, 21–23, 25, 27, 28, 30]) reported mean (SD) times for train‐of‐four twitches at four time points (T1, T2, T3, T4) in two (or three) groups. The ratio of means for two groups at times T1:T4 varied little, across several RCTs, ranging from 0.75 to 0.77 for all four time points (Fig. 4). The Monte Carlo p values for the homogeneity (Q statistic) of these results were 0.0055 (T1), 0.031 (T2), 0.016 (T3) and 0.0071 (T4). These and other ratios of muscular function and post‐tetanic count after neuromuscular blockade were presented graphically in 14 papers (Appendix S2 [5, 11, 16, 19, 23–25, 27, 30–32, 36, 37, 40]). The lines of some of these graphs were coincident, or nearly so, and are presented in Fig. 5 (all graphs reproduced with permission).

image
Reproduced graphs of mean (SD) values plotted as lines in multiple graphs. The numbered references are listed in Appendix S2. The combined graphs (right column) are size‐adjusted overlays of the two graphs to the left, reproduced from different publications. In each case at least one coincident or near‐coincident curve can be identified, not consistent with random sampling. All graphs reproduced with permission.

Discussion

We have found improbable distributions of baseline data (1 in 100,000,000 combined) and improbable homogeneity of results across a substantial number of studies published by Dr Yuhji Saitoh, mirroring similar findings in the previous analysis of the work of Yoshitaka Fujii.

Saitoh has co‐authored 36 papers with Fujii, 11 with Saitoh as corresponding author of which eight have already been retracted. The investigation into Fujii concluded that three trials authored by Saitoh were probably conducted and reported honestly (Appendix S2 [4, 10, 11]). Analyses of baseline data indicate that it is unlikely that two of these (Appendix S2 [4, 11]) reported the results of simple random allocation of participants into groups.

The possibility of a more widespread problem within a research network suggests that such institutional investigations should not be restricted to single authors. In the case of Boldt, for example, his co‐authors published a paper without him 7 and this paper was also subsequently retracted.

The findings of this analysis support further institutional investigations into research published by Dr Yuhji Saitoh. Until such a time that these results can be explained, as was also recommended in the case of Fujii 3, we think it is important that Dr Saitoh's data are excluded from meta‐analyses or other reviews of the relevant subjects.

Acknowledgements

The authors would like to acknowledge the help of Dr Neville Gibbs and Dr Steve Yentis in the preparation of this paper.

    Competing interests

    No external funding or competing interests declared. JC is an editor of Anaesthesia and this manuscript has undergone additional external review as a result.

      Number of times cited according to CrossRef: 27

      • Check for publication integrity before misconduct, Nature, 10.1038/d41586-019-03959-6, 577, 7789, (167-169), (2020).
      • Integrity of randomized controlled trials: challenges and solutions, Fertility and Sterility, 10.1016/j.fertnstert.2020.04.018, (2020).
      • Fabricated data - should we quarantine? A novel tool for risk assessment is proposed, European Journal of Obstetrics & Gynecology and Reproductive Biology, 10.1016/j.ejogrb.2020.04.020, (2020).
      • Data integrity of 35 randomised controlled trials in women’ health, European Journal of Obstetrics & Gynecology and Reproductive Biology, 10.1016/j.ejogrb.2020.04.016, (2020).
      • To share or not to share data: how valid are trials evaluating first-line ovulation induction for polycystic ovary syndrome?, Human Reproduction Update, 10.1093/humupd/dmaa031, (2020).
      • False individual patient data and zombie randomised controlled trials submitted to Anaesthesia , Anaesthesia, 10.1111/anae.15263, (2020).
      • Rounding, but not randomization method, non-normality, or correlation, affected baseline P-value distributions in randomized trials, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2019.03.001, 110, (50-62), (2019).
      • Falling Dominoes, Anesthesia & Analgesia, 10.1213/ANE.0000000000004037, 128, 4, (613-614), (2019).
      • How a data detective exposed suspicious medical trials, Nature, 10.1038/d41586-019-02241-z, 571, 7766, (462-464), (2019).
      • Assessing and Raising Concerns About Duplicate Publication, Authorship Transgressions and Data Errors in a Body of Preclinical Research, Science and Engineering Ethics, 10.1007/s11948-019-00152-w, (2019).
      • Publishing Ethics, Medical and Scientific Publishing, 10.1016/B978-0-12-809969-8.00019-X, (179-186), (2018).
      • What’s Responsible for the Retraction Boom?, Research Ethics in the Digital Age, 10.1007/978-3-658-12909-5, (23-28), (2018).
      • Registration of published randomized trials: a systematic review and meta-analysis, BMC Medicine, 10.1186/s12916-018-1168-6, 16, 1, (2018).
      • Journal response: prospective clinical trial registration – desirable, but not necessary, Anaesthesia, 10.1111/anae.14198, 73, 5, (542-544), (2018).
      • Retraction, Anaesthesia, 10.1111/anae.14240, 73, 4, (526-526), (2018).
      • Minimising ‘research waste’ in academic anaesthesia funding and outputs, Anaesthesia, 10.1111/anae.14280, 73, 6, (663-668), (2018).
      • Correctable Myths About Research Misconduct in the Biomedical Sciences, Science and Engineering Ethics, 10.1007/s11948-018-0027-3, (2018).
      • Between evidence and commerce – the case of sufentanil sublingual tablet systems, Anaesthesia, 10.1111/anae.14037, 73, 2, (143-147), (2017).
      • Seeking and reporting apparent research misconduct: errors and integrity, Anaesthesia, 10.1111/anae.14147, 73, 1, (125-126), (2017).
      • The fate of manuscripts rejected from Anaesthesia, Anaesthesia, 10.1111/anae.13829, 72, 4, (427-430), (2017).
      • Application of Benford’s law: a valuable tool for detecting scientific papers with fabricated data?Anwendung des Benford’schen Gesetzes: ein wertvolles Instrument zur Detektion wissenschaftlicher Arbeiten mit gefälschten Daten?, Der Anaesthesist, 10.1007/s00101-017-0333-1, 66, 10, (795-802), (2017).
      • An Appraisal of the Carlisle-Stouffer-Fisher Method for Assessing Study Data Integrity and Fraud, Anesthesia & Analgesia, 10.1213/ANE.0000000000002415, 125, 4, (1381-1385), (2017).
      • Data fabrication and other reasons for non‐random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals, Anaesthesia, 10.1111/anae.13938, 72, 8, (944-952), (2017).
      • In search of consensus on ethics in airway research, Anaesthesia, 10.1111/anae.13961, 72, 10, (1175-1179), (2017).
      • Widening the search for suspect data – is the flood of retractions about to become a tsunami?, Anaesthesia, 10.1111/anae.13962, 72, 8, (931-935), (2017).
      • Errors and Integrity in Seeking and Reporting Apparent Research Misconduct, Anesthesiology, 10.1097/ALN.0000000000001875, 127, 5, (733-737), (2017).
      • What Anaesthesia is doing to combat scientific misconduct and investigate data fabrication and falsification, Anaesthesia, 10.1111/anae.13731, 72, 1, (3-4), (2016).