Evidence for non‐random sampling in randomised, controlled trials by Yuhji Saitoh
You can respond to this article at http://www.anaesthesiacorrespondence.com
Summary
A large number of randomised trials authored by Yoshitaka Fujii have been retracted, in part as a consequence of a previous analysis finding a very low probability of random sampling. Dr Yuhji Saitoh co‐authored 34 of those trials and he was corresponding author for eight of them. We found a number of additional randomised, controlled trials that included baseline data, with Saitoh as corresponding author, that Fujii did not co‐author. We used Monte Carlo simulations to analyse the baseline data from 32 relevant trials in total as well as an outcome (muscle twitch recovery ratios) reported in several. We also compared a series of muscle twitch recovery graphs appearing in a number of Saitoh's publications. The baseline data in 14/32 randomised, controlled trials had p < 0.01, of which seven p values were < 0.001. Eight trials reported four ratios of the time for the return of muscle activity after neuromuscular blockade, the distributions of which were homogeneous: the p values for the observed Q statistics were 0.0055, 0.031, 0.016 and 0.0071. Comparison of graphs revealed multiple coincident or near‐coincident curves across a large number of publications, a finding also inconsistent with random sampling. Combining the continuous and categorical probabilities of the 32 included trials, we found a very low likelihood of random sampling: p = 1.27 × 10−8 (1 in 100,000,000). The high probability of non‐random sampling and the repetition of lines in multiple graphs suggest that further scrutiny of Saitoh's work is warranted.
Introduction
In 2006, an analysis of homogeneity in meta‐analyses identified a very extreme degree of between‐study homogeneity in five studies published by Joachim Boldt 1. Suspicions raised by readers of a 2009 publication subsequently led to an institutional investigation and ultimately the retraction of more than 90 of Boldt's published studies for lack of ethics approval and fabrication of data 2.
In 2012, a similar analysis of the baseline variables in a large number of studies published by Yoshitaka Fujii found a very low probability of random sampling 3. This evidence formed an important part of the request that prompted a multi‐institutional investigation of Fujii's publications, ultimately leading to the recommendation that over 180 papers should be retracted, again for lack of ethics approval and fabrication 4. The methods used for the analysis have subsequently been refined 5. One of Fujii's co‐authors on 34 of those retracted papers was Dr Yuhji Saitoh, who was first and corresponding author on eight of these trials.
Following concerns raised over a new submission to the journal Anaesthesia and Intensive Care, we undertook a more focused analysis of data in randomised, controlled trials with Dr Yuhji Saitoh as an author.
Methods
In 2013, randomised, controlled trials published in six anaesthesia journals (2002–2012) were surveyed (unpublished). The distributions of mean (SD) for baseline variables were analysed using a published method 3. Additional studies for authors of at least two trials for which p < 0.05 were retrieved. The analyses were repeated using Monte Carlo simulations, which is a more reliable method than that used for Fujii (as described in the June 2015 issue of Anaesthesia). Monte Carlo simulations were also used for baseline categorical variables. The method used to analyse baseline continuous variables has been described in detail 3, 5. In summary, Monte Carlo simulations were used instead of an independent t‐test or ANOVA to generate a p value for differences between means. The aspect of interest is the probability that the difference in means would be less than reported (the left‐hand tail of the distribution), which is equal to (1 − p)/2 where ‘p’ is the p value generated by a two‐sided t‐test or ANOVA. However, parametric tests of summary data generate p = 1 when the means are the same, as if they were identical to an infinite number of decimal places. Monte Carlo simulations are needed when the precision of means is insufficient to discriminate their differences. Monte Carlo simulations were also used for categorical variables and Stouffer's method to combine p values for continuous variables, categorical variables and all baseline variables. The Kolmogorov–Smirnov test was used, against a uniform distribution, for the p values of variables and randomised, controlled trials. The homogeneity of the standardised mean differences in the twitch recovery times (in a train‐of‐four) in the relevant studies was also analysed using Monte Carlo simulations for the Q statistic, as well as for the tau statistic and effect size probability. This type of analysis was used to identify the unusual homogeneity in the results of Boldt et al. in 2006 1. The code used to program the Monte Carlo simulations is available as an online appendix (Appendix S1).
By December 2013, 11 trials with baseline data published by Saitoh and co‐authors had been so detected and analysed: 6/11 had unlikely distributions of baseline data, and it was noticed additionally that at least two others shared graph lines in common, which seemed unlikely, so a wider comparison of graphs in Saitoh's publications was also undertaken. Graphs were copied and transparently pasted on top of other graphs.
An additional six trials with baseline data that Saitoh had co‐authored with Fujii had previously been analysed, 3 but the association was not recognised at the time. After the submission to Anaesthesia and Intensive Care raised concerns, the Y Saitohs were identified as the same individual. A number of additional published trials and one unpublished trial (the paper submitted to Anaesthesia and Intensive Care) could then be added to the analysis. All analyses were conducted in R 6.
Results
In addition to the unpublished trial submitted to Anaesthesia and Intensive Care, we retrieved 40 studies with Yuhji Saitoh as an author and for which Yoshitaka Fujii was not corresponding author (Appendix S2 [1–40]); in all we analysed baseline continuous data in 32 randomised, controlled trials (Appendix S2 [1–32]). Dr Saitoh was corresponding author for 26 of these trials (Appendix S2 [1–17, 19–21, 23–25, 30–32]), six of which have been retracted (Appendix S2 [12–17]) and one rejected before publication (Appendix S2 [32]). A further two randomised, controlled trials with Dr Saitoh as corresponding author that did not present baseline data have been retracted (Appendix S2 [33, 34]).
The baseline variables of 14/32 trials had combined p < 0.01 (one right‐hand p value), of which seven were < 0.001 (Table 1 and Fig. 1). These p values are for the distribution of baseline means and rates and are less extreme than those calculated for 158 randomised, controlled trials with Yoshitaka Fujii as author (Fig. 2). The probability for distributions of standard deviations and their associated means can also be calculated. For example, both means and standard deviations are proximate in Fig. 3 of reference (Appendix S2 [24]), reproduced (with permission) in Fig. 3. The probability that a similar table would contain mean (SD) combinations as or more similar than reported was 0.0000089, determined in 100 million Monte Carlo simulations.
| Reference Appendix S2 | Year | Journal | Volume | 1st page | Corresponding author | p value for baseline variables | ||
|---|---|---|---|---|---|---|---|---|
| Continuous | Categorical | Combined | ||||||
| 1 | 1993 | BJA | 70 | 402 | Saitoh | 0.046 | 0.013 | 0.0037 |
| 2 | 1995 | BJA | 74 | 293 | Saitoh | 0.18 | 0.072 | 0.055 |
| 3 | 1995 | CJA | 42 | 992 | Saitoh | 0.40 | 0.00037 | 0.026 |
| 4 | 1995 | CJA | 42 | 1096aa
Investigated, not retracted.
|
Saitoh | 0.0096 | 0.0021 | 0.00027 |
| 5 | 1996 | CJA | 43 | 362 | Saitoh | 0.15 | 0.0021 | 0.0086 |
| 6 | 1997 | AA | 84 | 1354 | Saitoh | 0.45 | 0.0055 | 0.080 |
| 7 | 1997 | AAS | 41 | 741 | Saitoh | 0.14 | 0.013 | 0.019 |
| 8 | 1997 | CJA | 44 | 390 | Saitoh | 0.16 | 0.0011 | 0.0089 |
| 9 | 1997 | EJA | 14 | 327 | Saitoh | 0.98 | 0.00082 | 0.59 |
| 10 | 1998 | AAS | 42 | 851 | Saitoh | 0.22 | 0.0045 | 0.051 |
| 11 | 1998 | An | 53 | 244aa
Investigated, not retracted.
|
Saitoh | 0.0072 | 0.0023 | 0.0015 |
| 12 | 1998 | EJA | 15 | 524 | Saitoh | 0.013 | 0.0000015 | 0.0000097 |
| 13 | 1998 | EJA | 15 | 649 | Saitoh | 0.68 | 0.084 | 0.39 |
| 14 | 1999 | BJA | 82 | 329bb
Investigated, retracted.
|
Saitoh | 0.015 | 0.012 | 0.0011 |
| 15 | 1999 | BJA | 83 | 275bb
Investigated, retracted.
|
Saitoh | 0.13 | 0.0060 | 0.0093 |
| 16 | 1999 | AA | 89 | 1565bb
Investigated, retracted.
|
Saitoh | 0.11 | 0.0022 | 0.0068 |
| 17 | 2001 | CJA | 48 | 28bb
Investigated, retracted.
|
Saitoh | 0.028 | 0.0021 | 0.00090 |
| 18 | 2001 | AA | 93 | 1214 | Oshima | 0.57 | 0.049 | 0.34 |
| 19 | 2001 | BJA | 86 | 814 | Saitoh | 0.70 | 0.0022 | 0.046 |
| 20 | 2002 | JA | 16 | 102 | Saitoh | 0.61 | 0.20 | 0.47 |
| 21 | 2002 | An | 57 | 218 | Saitoh | 0.78 | 0.0045 | 0.34 |
| 22 | 2003 | An | 58 | 643 | Nakajima | 0.68 | – | – |
| 23 | 2003 | BJA | 90 | 480 | Saitoh | 0.019 | 0.073 | 0.00028 |
| 24 | 2003 | CJA | 50 | 342 | Saitoh | 0.0027 | 0.021 | 0.00012 |
| 25 | 2005 | CJA | 52 | 467 | Saitoh | 0.0.095 | 0.067 | 0.29 |
| 26 | 2005 | JCA | 17 | 276 | Hattori | 0.94 | 0.39 | 0.89 |
| 27 | 2005 | EJA | 22 | 20 | Hattori | 0.16 | 0.073 | 0.048 |
| 28 | 2007 | FJMS | 53 | 61 | Katayama | 0.81 | – | – |
| 29 | 2010 | JA | 24 | 168 | Oshima | 0.9999986 | 0.22 | 0.999961 |
| 30 | 2010 | JCA | 22 | 318 | Saitoh | 0.95 | – | – |
| 31 | 2012 | JA | 26 | 28 | Saitoh | 0.57 | 0.66 | 0.64 |
| 32 | 2015 | AIC | Unpublished | Saitoh | 0.0018 | 0.060 | 0.0010 | |
- AA, Anesthesia and Analgesia; AAS, Acta Anaesthesiologica Scandinavica; AIC, Anaesthesia and Intensive Care; An, Anaesthesia; BJA, British Journal of Anaesthesia; CJA, Canadian Journal of Anesthesia; EJA, European Journal of Anaesthesiology; FJMS, Fukushima Journal of Medical Sciences; JA, Journal of Anesthesia; JCA, Journal of Clinical Anesthesia.
- a Investigated, not retracted.
- b Investigated, retracted.




Eight papers (Appendix S2 [19, 21–23, 25, 27, 28, 30]) reported mean (SD) times for train‐of‐four twitches at four time points (T1, T2, T3, T4) in two (or three) groups. The ratio of means for two groups at times T1:T4 varied little, across several RCTs, ranging from 0.75 to 0.77 for all four time points (Fig. 4). The Monte Carlo p values for the homogeneity (Q statistic) of these results were 0.0055 (T1), 0.031 (T2), 0.016 (T3) and 0.0071 (T4). These and other ratios of muscular function and post‐tetanic count after neuromuscular blockade were presented graphically in 14 papers (Appendix S2 [5, 11, 16, 19, 23–25, 27, 30–32, 36, 37, 40]). The lines of some of these graphs were coincident, or nearly so, and are presented in Fig. 5 (all graphs reproduced with permission).

Discussion
We have found improbable distributions of baseline data (1 in 100,000,000 combined) and improbable homogeneity of results across a substantial number of studies published by Dr Yuhji Saitoh, mirroring similar findings in the previous analysis of the work of Yoshitaka Fujii.
Saitoh has co‐authored 36 papers with Fujii, 11 with Saitoh as corresponding author of which eight have already been retracted. The investigation into Fujii concluded that three trials authored by Saitoh were probably conducted and reported honestly (Appendix S2 [4, 10, 11]). Analyses of baseline data indicate that it is unlikely that two of these (Appendix S2 [4, 11]) reported the results of simple random allocation of participants into groups.
The possibility of a more widespread problem within a research network suggests that such institutional investigations should not be restricted to single authors. In the case of Boldt, for example, his co‐authors published a paper without him 7 and this paper was also subsequently retracted.
The findings of this analysis support further institutional investigations into research published by Dr Yuhji Saitoh. Until such a time that these results can be explained, as was also recommended in the case of Fujii 3, we think it is important that Dr Saitoh's data are excluded from meta‐analyses or other reviews of the relevant subjects.
Acknowledgements
The authors would like to acknowledge the help of Dr Neville Gibbs and Dr Steve Yentis in the preparation of this paper.
Competing interests
No external funding or competing interests declared. JC is an editor of Anaesthesia and this manuscript has undergone additional external review as a result.
References
Citing Literature
Number of times cited according to CrossRef: 27
- Andrew Grey, Mark J. Bolland, Alison Avenell, Andrew A. Klein, C. K. Gunsalus, Check for publication integrity before misconduct, Nature, 10.1038/d41586-019-03959-6, 577, 7789, (167-169), (2020).
- Wentao Li, Madelon van Wely, Lyle Gurrin, Ben W. Mol, Integrity of randomized controlled trials: challenges and solutions, Fertility and Sterility, 10.1016/j.fertnstert.2020.04.018, (2020).
- Janesh Gupta, Fabricated data - should we quarantine? A novel tool for risk assessment is proposed, European Journal of Obstetrics & Gynecology and Reproductive Biology, 10.1016/j.ejogrb.2020.04.020, (2020).
- Esmée M Bordewijk, Rui Wang, Lisa M Askie, Lyle C Gurrin, Jim G Thornton, Madelon van Wely, Wentao Li, Ben W Mol, Data integrity of 35 randomised controlled trials in women’ health, European Journal of Obstetrics & Gynecology and Reproductive Biology, 10.1016/j.ejogrb.2020.04.016, (2020).
- Esmee M Bordewijk, Rui Wang, Madelon van Wely, Michael F Costello, Robert J Norman, Helena Teede, Lyle C Gurrin, Ben W Mol, Wentao Li, To share or not to share data: how valid are trials evaluating first-line ovulation induction for polycystic ovary syndrome?, Human Reproduction Update, 10.1093/humupd/dmaa031, (2020).
- J. B. Carlisle, False individual patient data and zombie randomised controlled trials submitted to Anaesthesia , Anaesthesia, 10.1111/anae.15263, (2020).
- Mark J. Bolland, Greg D. Gamble, Alison Avenell, Andrew Grey, Rounding, but not randomization method, non-normality, or correlation, affected baseline P-value distributions in randomized trials, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2019.03.001, 110, (50-62), (2019).
- David M. Polaner, Steven L. Shafer, Falling Dominoes, Anesthesia & Analgesia, 10.1213/ANE.0000000000004037, 128, 4, (613-614), (2019).
- David Adam, How a data detective exposed suspicious medical trials, Nature, 10.1038/d41586-019-02241-z, 571, 7766, (462-464), (2019).
- Andrew Grey, Alison Avenell, Greg Gamble, Mark Bolland, Assessing and Raising Concerns About Duplicate Publication, Authorship Transgressions and Data Errors in a Body of Preclinical Research, Science and Engineering Ethics, 10.1007/s11948-019-00152-w, (2019).
- Publishing Ethics, Medical and Scientific Publishing, 10.1016/B978-0-12-809969-8.00019-X, (179-186), (2018).
- Adam Marcus, Ivan Oransky, What’s Responsible for the Retraction Boom?, Research Ethics in the Digital Age, 10.1007/978-3-658-12909-5, (23-28), (2018).
- Ludovic Trinquart, Adam G. Dunn, Florence T. Bourgeois, Registration of published randomized trials: a systematic review and meta-analysis, BMC Medicine, 10.1186/s12916-018-1168-6, 16, 1, (2018).
- J. J. Pandit, A. A. Klein, Journal response: prospective clinical trial registration – desirable, but not necessary, Anaesthesia, 10.1111/anae.14198, 73, 5, (542-544), (2018).
- Retraction, Anaesthesia, 10.1111/anae.14240, 73, 4, (526-526), (2018).
- J. J. Pandit, A. F. Merry, Minimising ‘research waste’ in academic anaesthesia funding and outputs, Anaesthesia, 10.1111/anae.14280, 73, 6, (663-668), (2018).
- Barbara K. Redman, Correctable Myths About Research Misconduct in the Biomedical Sciences, Science and Engineering Ethics, 10.1007/s11948-018-0027-3, (2018).
- C. Bantel, H. C. Laycock, Between evidence and commerce – the case of sufentanil sublingual tablet systems, Anaesthesia, 10.1111/anae.14037, 73, 2, (143-147), (2017).
- E. D. Kharasch, T. T. Houle, Seeking and reporting apparent research misconduct: errors and integrity, Anaesthesia, 10.1111/anae.14147, 73, 1, (125-126), (2017).
- A. B. Docherty, A. A. Klein, The fate of manuscripts rejected from Anaesthesia, Anaesthesia, 10.1111/anae.13829, 72, 4, (427-430), (2017).
- S. Hüllemann, G. Schüpfer, J. Mauch, Application of Benford’s law: a valuable tool for detecting scientific papers with fabricated data?Anwendung des Benford’schen Gesetzes: ein wertvolles Instrument zur Detektion wissenschaftlicher Arbeiten mit gefälschten Daten?, Der Anaesthesist, 10.1007/s00101-017-0333-1, 66, 10, (795-802), (2017).
- Edward J. Mascha, Thomas R. Vetter, Jean-Francois Pittet, An Appraisal of the Carlisle-Stouffer-Fisher Method for Assessing Study Data Integrity and Fraud, Anesthesia & Analgesia, 10.1213/ANE.0000000000002415, 125, 4, (1381-1385), (2017).
- J. B. Carlisle, Data fabrication and other reasons for non‐random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals, Anaesthesia, 10.1111/anae.13938, 72, 8, (944-952), (2017).
- T. M. Cook, L. V. Duggan, M. S. Kristensen, In search of consensus on ethics in airway research, Anaesthesia, 10.1111/anae.13961, 72, 10, (1175-1179), (2017).
- J. A. Loadsman, T. J. McCulloch, Widening the search for suspect data – is the flood of retractions about to become a tsunami?, Anaesthesia, 10.1111/anae.13962, 72, 8, (931-935), (2017).
- Evan D. Kharasch, Timothy T. Houle, Errors and Integrity in Seeking and Reporting Apparent Research Misconduct, Anesthesiology, 10.1097/ALN.0000000000001875, 127, 5, (733-737), (2017).
- A. A. Klein, What Anaesthesia is doing to combat scientific misconduct and investigate data fabrication and falsification, Anaesthesia, 10.1111/anae.13731, 72, 1, (3-4), (2016).




