Volume 61, Issue 3 p. 222-228
Free Access

Incidence and significance of errors in a patient ‘track and trigger’ system during an epidemic of Legionnaires' disease: retrospective casenote analysis

A. F. Smith

A. F. Smith

Consultant, Department of Anaesthesia

Search for more papers by this author
R. J. Oakey

R. J. Oakey

Research & Development Fellow, University Hospitals of Morecambe Bay NHS Trust, Royal Lancaster Infirmary, Lancaster LA1 4RR, UK

Search for more papers by this author
First published: 14 February 2006
Citations: 57
A. F. Smith
[email protected]


Early warning scoring is designed to be an objective tool to aid identification of hospital patients at risk of deterioration. ‘Track and trigger’ systems using such scores are widely used but many aspects of scoring have not been clarified. We aimed to document how observations and scores are used in practice as part of a typical track and trigger system. We extracted patient observations and early warning scores from the casenotes of 189 patients admitted to Furness General Hospital during a large outbreak of Legionnaires' disease in 2002. We used these 3739 sets of primary observations to recalculate scores, and compared them with those recorded in the casenotes. Recording of patient observations was variable. Early warning scores were derived from 2607 sets of observations (69.7%), of which 571 (21.9%) had been incorrectly calculated. Incorrect scoring meant that 66 of 270 patients (24.4%) whose observations should have reached the trigger value did not. Patients with more abnormal observations were more likely to be misscored. Scoring errors were more likely to lead to underscoring as the degree of physiological abnormality increased. Patients with confirmed Legionnaires' disease were more likely to be incorrectly scored. We conclude that the assignment of early warning scores is prone to error and this may delay referral of at-risk patients for critical care management.

Intensive Care Units (ICUs) have been developed since the 1960s as specialised hospital wards where patients in need of intensive monitoring or treatment can be cared for by specialist staff. Previous surveys have shown that some patients received suboptimal care before admission to ICU [1] and that many are only referred to ICU once cardiopulmonary arrest has occurred, even though there are often signs of slower deterioration beforehand [2]. Clearly, it is preferable for patients in need of intensive care to be recognised as such and treated as promptly as possible.

One way of achieving this attention is to make staff with the necessary expertise available to general hospital wards. The Medical Emergency Team established by Lee and colleagues in Australia in the early 1990s is perhaps the first documented example of this approach; patients at risk are identified by abnormalities in vital signs [3]. Later modifications assigned scores to ward observations, the scores being greater as the observations deviate further from normality [4, 5]. Intensive care expertise is sought when a predetermined ‘trigger’ value is reached. Initially termed ‘early warning scoring’ systems (EWSS), these are now becoming known as ‘track and trigger’ systems (TTS).

Recently, such scoring systems have become widely used in the United Kingdom as part of what is now termed ‘critical care outreach’[6,7]. However, the evaluation of such systems has not kept pace with their spread [8]. Although high scores are associated with higher mortality [9, 10] it has been harder to demonstrate an improvement in outcome with early warning scoring [11]. Further, although the effectiveness of any healthcare technology depends critically on how it is applied, this fundamental aspect has not been investigated for such scoring systems. We therefore aimed to describe how a typical scoring system is used in practice and have drawn our data from ward observations and scores made during the outbreak of Legionnaires' disease in Barrow-in-Furness in the summer of 2002 [12–14]. In particular, we aimed to assess the incidence of missing and inaccurate observations and scores and to explore the influences on nurses' scoring behaviour.



Approval for this research was obtained from the Morecambe Bay Local Research Ethics Committee (LREC). We wrote to patients admitted to the hospital with suspected Legionnaires' disease during the outbreak, to ask if we might include their data. The list of patients was derived from a database compiled during the outbreak to help patient tracking and public health management. The names and addresses were checked by local NHS tracing administrators (Lancashire and South Cumbria Agency), and 498 letters were sent out early in 2003. The next of kin were contacted in the case of the deceased, using contact details provided by the Agency. The LREC had specified that steps should be taken to gain patients' consent, but we were allowed to proceed on the basis that patients had to positively ‘opt out’ of the study. Forty-four replies were received from patients who wished to withhold consent (Fig. 1) and these patients' data were not used.

Details are in the caption following the image

Flow diagram for patients and sets of observations.

We identified two groups: those in whom Legionnaires' disease had been confirmed microbiologically (Legionnaires'-positive, LP) and those in whom it had not (Legionnaires'-negative, LN) (Fig. 1). Once we had established the number of LP patients available for study, we selected a comparison group of LN patients. To achieve similar numbers in both groups we first excluded patients whose age lay outside the age range of the LP patients. The remaining patients were placed in a random order using a computer-generated random number table. We worked through these in sequence, matching for the sex distribution of the LP group, until we had a number equal to the number of patients in the LP group. To be eligible for analysis, patients in both groups had to have been inpatients in Furness General Hospital between 10:00 h on 3 August and 23:59 h on 22 August 2002. The earlier point represented the time when early warning scoring had been implemented on all wards in the hospital (having been extended from a limited trial earlier that year). We chose the later point as by this date the excess admissions with Legionnaires-like symptoms had subsided. Some patients from both groups were admitted directly to the Intensive Care Unit and so had no early warning scores performed.

Data collection and analysis

We obtained patients' casenotes and extracted the primary observations required for the early warning score (EWS) from nursing records and observation charts. We defined a ‘set’ of observations as one or more of the following recorded against a legible time entry: respiratory rate (RR), heart rate (HR), systolic arterial pressure (SBP) and/or temperature (Temp). The data were transposed into a Microsoft Excel® spreadsheet along with recorded scores that had been calculated by the nursing staff at the time the observations had been made, using the scoring instruction in use within the hospital during the outbreak (Table 1). If patient responsiveness or urine output had been recorded, this was noted, but the presence of these observations was not used to define an observation set. We also noted whether the ‘trigger score’ for referral for critical care advice had been reached (set at 3 during the outbreak). Four researchers, trained in consistent data entry, worked in pairs to ensure accuracy of transcription. The principal researcher, aided by Excel® Boolean algebraic logic functions, checked all entries for spurious data.

Table 1. Instruction as used during the Legionnaires' outbreak.
3 2 1 0 1 2 3
RR; breaths.min−1   < 8 9–14 15–20 21–29 > 29
HR; beats.min−1  < 41 41–50 51–100 101–130 111–130 > 130
SBP; mmHg < 70 70–79 80–100 101–200 > 201
Temperature; oC  < 35 35–38 > 38.5
Responsiveness * A&C A V P U
Urine rate; ml.h−1# Nil < 30 > 30 200–300 > 300
  • HR, heart rate; RR, respiration rate; SBP, systolic arterial pressure.
  • * A & C = agitated and/or confused; A = awake; V = responds to voice; P = responds to pain; U = unresponsive.
  • # For non-catheterised patients, average over 4 h.

We noted that the EWS instructions used during the Legionnaires' outbreak (Table 1) contained some minor ambiguities. For the purpose of our analysis we required numerically correct EWS instruction: Table 2 displays the clarified instructions used in this study to recalculate ‘true’ early warning scores. ‘True’ early warning scores were calculated in Excel® using Boolean algebraic functions and the primary observation data transposed from nursing charts. We compared ‘true’ EWS with the recorded EWS to identify possible errors. This yielded figures for the incidence and nature of missing ward observations and the rate of missing and miscalculated scores. We then analysed the relationships between the likelihood of EWS error and degree of physiological abnormality. To allow comparisons between groups, error rates for each individual patient averaged over their length of stay were computed. This adjusts for the positively skewed distribution for length of stay in both groups.

Table 2. Corrected EWS instruction (clarified categories in italics).
3 2 1 0 1 2 3
RR; breaths.min−1 < 9  9–14 15–20 21–29 > 29
HR; beats.min−1 < 41 41–50  51–100 101–110 111–130 > 130
SBP; mmHg < 70 70–79 80–100  101–200 > 200
Temperature; oC < 35.0 35.0–38.5 > 38.5
Responsiveness A&C A V P U
Urine rate; ml.h−1 < 1 1–30  31–200 201–300 > 300
  • See Table 1 for explanation of abbreviations.


Patient profile

There was a sharp increase in admissions to Furness General Hospital of patients with respiratory or influenza-like symptoms towards the end of July 2002. Those subsequently found to be Legionnaires'-positive (LP) peaked on 3 August 2002, the day after the outbreak was made public. Numbers of Legionnaires'-negative (LN) patients peaked 4 days later (Fig. 2). Patients in the LP group were significantly older than those in the LN group, their median (IQR) age being 64.7 (55.3–75.9) years vs. 61.0 (45.7–71.7) years, (p < 0.02).

Details are in the caption following the image

Admission profile during outbreak. Hatched lines denote cut-off points within which patient observation and score data were extracted.

Recording of ward observations and conversion to scores

In all, 203 patients were eligible for analysis: 95 LP patients and 108 LN patients. Fourteen patients went directly to the Intensive Care Unit without scores being recorded (Fig. 1).

The remaining 189 patients (89 LP and 100 LN) yielded a total of 3739 sets of primary ward observations (2039 LP and 1700 LN). Their median (IQR) length of stay was 3.6 (2.4–5.5) days. Overall, a median (IQR) of 4.9 (3.7–5.7) observation sets patient−1.day−1 were recorded. Respiratory rate (RR) was recorded in 2757 sets (73.7%), heart rate (HR) in 3452 sets (92.3%), systolic arterial pressure (SBP) in 3560 sets (95.2%) and temperature (Temp) in 3507 sets (93.8%) (Table 3). In 65% of sets all four observations were present, in 27% three were present, in 4% two and in 4% one.

Table 3. Frequency of recording of primary ward observations and conversion into early warning scores (EWS), by observation (numbers with percentages).
Denominator3753 Primaryobservations EWS scores EWS withoutprimaryobservations Distribution of scores
0 1 2 ≥3
RR 2757 (73.5%) 2643 (70.4%) 109 288 1608 669 78
HR 3452 (92.0%) 2548 (67.9%) 135 2322 190 32 4
SBP 3560 (94.9%) 2535 (67.5%) 42 2403 103 23 6
Temp 3507 (93.4%) 2546 (67.8%) 145 2432 43 68 3
Urine rate  – 2074 (55.3%) 1891 1976 42 18 38
Response  – 2471 (65.8%) 2458 2449 21 1 0
  • See Table 1 for explanation of abbreviations.

Heart rate, arterial pressure and temperature had been entered graphically in the casenotes, leading to a tendency to round to multiples of five for HR & SBP and to 0.2 or 0.5 °C in the case of temperature. However, the observation chart in use at the time of the outbreak required nurses to enter respiratory rate numerically rather than graphically so ‘rounding’ was not as apparent in this observation. Recording of urine output was erratic. It was not always recorded at the same time as the other components of the observation set and it was always recorded as a volume rather than a rate. Patients' responsiveness was also frequently not recorded. Consequently, urine output and responsiveness were excluded from detailed analysis.

EWS totals had been derived for 2607 observation sets (69.5%), giving a median (IQR) of 3.6 (1.9–4.7) EWS patient−1.day−1. Comparing recorded EWS totals with ‘true’ EWS, we found that 571 of the 2607 score totals (21.9%) had been incorrectly calculated by nursing staff. (In some cases, incorrect EWS totals included errors in two or more contributing primary observations). Thus, in total, only 54.4% (2036 of 3739) of observation sets contained a correct EWS.

The proportions of individual observations scored varied between the four primary observations studied. Respiratory rate was least commonly recorded in observation sets but was the most often scored when present (Table 3). However, it had the highest error rate in assigning scores, being incorrectly scored in 264 of 2757 observations (9.6%). The HR score was incorrect in 188 of 3452 observations (5.4%), SBP score in 153 of 3560 (4.3%) and temperature score in 136 of 3507 (3.9%). In addition, for 431 observation sets (11.5%), scores were assigned but some of the primary observations presumably used to calculate these scores were not recorded. Urine output and responsiveness scores were often assigned even when primary observations had not been recorded. These scores, however, were much more likely to be zero than for the other four observations.

Direction of error – effect of misscoring

For all patients, errors in scoring were more likely at a ‘true’ score of three or above than at a ‘true’ score of one or two. In general, the more abnormal the primary observation, the more likely it was to be misscored. At a ‘true’ score of 1, overscoring was more likely, but underscoring was more likely at ‘true’ scores of two (65.7% underscored), three (76.7% underscored) and four (84.6% underscored). We also established the direction of error (under- or overscored) in the 270 of the 2607 sets of scores where the primary observations should have been scored as three (the trigger level) or above: of these, 122 were incorrect, 97 were underscored, and 66 did not reach the trigger score when they should have done. Hence, 66 of 270 (24.4%) observation sets should have triggered but did not.

Comparisons between groups

We found no significant differences between the groups in the number of observations recorded per patient per day, nor in the number of EWS calculated per patient per day, nor in the proportion of EWS not calculated when observations were present (see Table 4). The overall proportion of scores incorrectly calculated was, however, higher in the LP group (17% vs. 12%, median difference 5%[95% CI 0–10.7, p = 0.02]).

Table 4. Between-group differences (Mann–Whitney U-test, presented as median ± interquartile range (IQR)).
Median (IQR) 95% CI p
LP group (n = 89) LN group (n = 100)
Observation sets; patient−1.day−1 4.93 (3.93–5.84) 4.83 (3.36–5.66) −0.12 to 0.77 0.15
EWS; patient−1.day−1 3.39 (1.80–4.78) 3.64 (1.87–4.68) −0.57 to 0.59 0.94
Missing EWS; % 0.20 (0.09–0.55) 0.17 (0.07–0.48) −0.03 to 0.09 0.34
Incorrect EWS; % 0.17 (0.00–0.33) 0.12 (0.00–0.23) 0 to 0.11 0.02
  • EWS, early warning scoring; LP, Legionnaires-positive; LN; Legionnaires-negative.

When error rates were calculated on a day-by-day basis daily error rates fell in both groups as the outbreak progressed (Fig. 3). The median error rate during the whole outbreak was 14%. (The difference between this and the pure arithmetical rate of 21.9% derived from raw total scores and quoted above reflects the positively skewed length-of-stay distribution in the outbreak population.) Further, scoring errors in the LP group were consistently less likely to lead to underscoring than in the LN group (Fig. 4).

Details are in the caption following the image

Proportions of miscalculated early warning scoring (EWS) over time during outbreak (both groups compared).

Details are in the caption following the image

Comparison of rates of miscalculated scores between groups: percentages of observation sets underscored at each level of ‘true’ score.


This study has found that the recording of patient observations is variable, and that the conversion of observations to early warning scores is often incomplete and sometimes wrong. We also found that patients with more abnormal observations are more likely to be misscored. Scoring errors tend to lead to underscoring and this tendency increases with the degree of physiological abnormality. Patients with confirmed Legionnaires' disease were more likely to be incorrectly scored. This is the first time to our knowledge that a ‘track and trigger’ system has been subject to such scrutiny in practice.

The use of anonymised, routinely collected data for purposes such as our study would not necessarily have required specific individual consent from the patients concerned, although approval from the Research Ethics Committee would of course be necessary. However, the Legionnaires' outbreak generated substantial local public concern and international media interest and also gave rise to a criminal investigation against South Lakeland District Council (responsible for maintaining the contaminated ventilation plant which acted as the source of the outbreak). The chairman of the Ethics Committee decided therefore that it would be in everyone's interests if we were to contact patients directly for consent. He did, however, allow us to include their data unless they wrote back to withhold permission.

Our database of observations and scores is substantial and rigorously quality-assured. The patients we included formed a sufficiently large proportion of the patients admitted during the outbreak for us to be confident that they faithfully reflect the characteristics of that larger population. Although we found it necessary to correct the cut-off points between categories on the scoring instruction used during the outbreak before we could recalculate scores for comparison, these changes were minor and did not significantly affect our results.

We had initially hoped to be able to relate patient scores to outcome, but this proved impractical for a number of reasons. Most importantly, as the mortality rate during this outbreak was unusually low for Legionnaires' disease [15, 16], death as an outcome was too rare to be used in a meaningful analysis. Second, experienced critical care staff spent a great deal of time on the general hospital wards during the outbreak and we know from our separate documentation of the management of the outbreak [13] that many decisions on critical care admission were made independently of the EWS. Third, and partly as a consequence of this, some patients were admitted to Intensive Care before any scores were documented.

The Legionnaires' outbreak was not typical of everyday hospital practice [14]. The errors we describe could have been due to the rapid introduction of a new system to staff working under pressure. We feel this is unlikely, as error rates fell during the course of the outbreak (suggesting the system can be learned easily and quickly) and an internal audit of routine practice later in 2002 found a similar rate. There is little published work on this. Chellel and colleagues noted that, in a routine sample of hospital inpatients, respiratory rate had not been recorded in the previous 8 h in 127 patients (55%) [17]. In our study, a median of 4.9 sets of observations per patient per day were performed, including respiratory rate in 73.5% of sets. Alcock and colleagues found that only 355 of 728 sick patients (49%) admitted to an Accident and Emergency department had a set of observations sufficient to calculate an EWS [18]. During the outbreak, scores were calculated for 69.7% of observation sets. These figures compare well with the two cited studies and suggest that the recording of observations, and conversion to scores, during the outbreak were at least as diligent as during routine care. Likewise, the between-group differences could have arisen simply from the fact that most LN patients were admitted later than the LP patients (Fig. 2). Had staff become accustomed to using the scoring system by the time they came to use it on the LN group? Again, we feel that this is unlikely. Figure 3 shows that error rates in both groups fluctuated over the course of the outbreak, with errors in LP patients being more common throughout.

Our finding that patients' observations in the LP group were more likely to be misscored raises the question of how clinical experience interacts with scoring systems such as this. Do experienced staff ‘manipulate’ (the use of the word is not meant to imply either that this is dishonest or even intentional) the scoring system to support their clinical impression of the patient? Further, did the knowledge that a patient had confirmed Legionnaires' disease influence scoring behaviour? (The results of the urinary antigen test used during the outbreak to confirm the disease were usually available within a few hours of admission.) This would be consistent with our finding that LP patients were consistently less likely to be underscored and would also support the finding of Cioffi and colleagues [19] that nurses seek objective data to validate their initial subjective impressions about a patient's condition before enlisting the help of doctors. It would also tie in with the work of Berg [20], who has demonstrated how practitioners become adept at making apparently unambiguous clinical protocols work more flexibly in the more fluid context of actual practice.

Although the context of our study was atypical and the patient group relatively homogeneous, there is much that is more widely applicable. Little is known about how observations are recorded in practice but it seems self-evident that nursing charts should be carefully designed to allow easy but accurate data entry. We believe that the relative absence of recording of respiratory rate – in a clinical context where it is highly relevant – may be due to the fact that it must be counted manually, unlike the heart rate and arterial pressure, for which automated readings can be obtained. Kenward and colleagues suggest, too, that pulse oximetry is now favoured as a monitor of ventilation [21], although this does not make physiological sense, especially in patients receiving supplementary oxygen. Further, scoring instructions should be clear and the cut-off points between scoring categories unambiguous. It is worth noting in this context that most scoring instructions in use in the UK are derived from Morgan's abstract [4], published in 1997. This has not, to our knowledge, been validated and, further, it introduced the minor ambiguities in EWS cut-off instructions which the Furness General Hospital chart reproduced. Future research could usefully address the selection of cut-off points between scoring categories for each observation and the choice of observations to include in the EWS. The clinical value of the different observations must be balanced against the ease and accuracy with which they can be measured and recorded [22]. Finally, frequent observation and scoring not only allows closer attention to the patient's condition but also tends to offset errors in individual scores.

In conclusion, we have documented that the use of a track and trigger system in practice is prone to incomplete recording and erroneous scoring which may reduce the effectiveness of the system and, by extension, that of outreach initiatives in critical care. We suggest that this can be improved by good design of charts and instructions and attention to the human factors affecting their use. No new health technology should be assumed to function perfectly straight away [23] and ‘real-world’ evaluations such as we have performed will help new systems find their true place in patient care.


This investigation was sponsored by a grant from the UK Department of Health. The sponsor approved the initial outline study design we submitted, but had no role in data collection, data analysis or interpretation, or the writing of the report.

The authors declare no conflict of interest relating to the publication of this paper.

The contributors to this paper were A.F. Smith, consultant anaesthetist (data interpretation and co-author of the paper); R.J. Oakey, research and development fellow (data collection, input, interpretation and analysis; co-author of the paper) and A.M. Harry, research and development co-ordinator (data collection and input).