Suppression of false arrhythmia alarms in the ICU: a machine learning approach

Sardar Ansari; Ashwin Belle; Hamid Ghanbari; Mark Salamango; Kayvan Najarian

doi:10.1088/0967-3334/37/8/1186

1. Introduction

Medical monitoring devices with inherent visual or auditory warning mechanisms play an integral part in patient care. The adoption and response rate for any clinical warning system depends significantly on the credibility and accuracy of its alarms. A search of the FDA's manufacturer and user facility device experience (MAUDE) database revealed 566 cases of deaths due to some form of failure in the monitoring device alarms between 2005 to 2008 (Cvach 2012). Numerous such highly sensitive alarms which are constantly triggered with low accuracy result in disrupting clinical environments and have therefore created a universally acknowledged and highly documented phenomenon called 'alarm fatigue' (Keller et al 2011, Cvach 2012, Sendelbach and Funk 2013).

One aspect of alarm fatigue is that caregivers tend to get desensitized to the alarm sounds which consequently leads to delayed or lack of response (Funk et al 2014). Furthermore multiple studies have repeatedly shown that the frequency and percentage of these alarms being false are unacceptably high (Siebig et al 2010, Keller et al 2011). A study by Schmid et al showed that 80% of several thousand alarms that were triggered in perioperative settings during a fixed period of observation were clinically non-consequential. Another study by Sendelbach et al which focused on physiological monitors concluded that nearly 72% to 99% of all alarms from these devices were demonstrably false (Sendelbach and Funk 2013). A similar study on cardiopulmonary monitors by Talley et al revealed that 85% to 90% of the alarms were false while only a few actually lead to any serious clinical intervention (Talley et al 2011). The ill effects and prevalence of alarm fatigue has been recognized as a national problem (Joint Commission et al 2013).

Widely varying solutions to the problem of medical alarm failure are being actively studied and developed by several groups in academia and industry alike. Some of these strategies focus on reducing the desensitization of care givers to device alarms (Phillips and Barnsteiner 2005, Edworthy and Hellier 2006, Graham and Cvach 2010). While other strategies involve developing centralized unified alarm notification systems which combines alarms from various devices to create 'smart' alarms (Bovill 1990, Mora et al 1993, Blount et al 2010, Moorman and Gee 2011, Busch-Vishniac 2015). An important strategy that is relevant to this manuscript, is improving the underlying computational structure upon which these alarms are based. Various researchers have used innovative signal processing (Takla et al 2006, Aboukhalil et al 2008, Couto et al 2015, Kalidas and Tamil 2015, Plesinger et al 2015), statistical (Mäkivirta et al 1991, Kennedy 1995, Imhoff and Kuhls 2006) and/or machine learning approaches (Shortliffe et al 1975, Imhoff and Kuhls 2006, Plesinger et al 2015) to develop solutions that have shown certain levels of efficacy. However developing effective solutions for reducing false alarm using physiological waveforms require the inventors to solve a comprehensive set of challenges in order to make the solution effective and practically feasible. In order to encourage the development of such comprehensive approaches to solving false alarm issues, Physionet hosted an open challenge in 2015 focusing on reduction of arrhythmia based false alarms in intensive care settings (Clifford et al 2015). The content of this paper describes our advancements to the algorithm which was submitted as an entry for this particular challenge (Ansari et al 2015).

This paper is broken into four main parts: beat detection by competitors, data, methods, results and conclusions. The beat detection by competitors section gives an overview of the top six, highest scoring beat detection submissions to the Physionet challenge and their beat detection algorithms. These algorithms are later compared to the proposed beat detection algorithm in this paper. In the data section, we will describe the data and how it was annotated. Next, in the methods section, the details of our proposed algorithm are described. Finally, we will conclude with results and our conclusions.

2. Top-scoring beat detection algorithms

The process of automated beat detection is a crucial aspect of reducing arrhythmia-based false alarms. This is because each type of arrhythmia outlined in the challenge data is fundamentally defined by presence and frequency or absence of beats in the waveforms prior to the alarm. Since beat detection plays such a critical role in reducing false alarms, in this section we briefly discuss the beat detection methods used by the top six performers of the Physionet 2015 challenge. In this work, beat refers to a normal QRS complex or an abnormal electrical activity such as premature ventricular contractions (PVC) or uncoordinated contractions during ventricular fibrillation.

Plesinger et al utilize ECG, PPG and ABP waveforms in their beat detection method (Plesinger et al 2015). Depending on the type of waveform, they utilize a different custom method to detect the beats of each waveform in a given window. These custom methods rely on manipulating either frequency components, window size or both. Next, they aggregate the identified beats from each waveform channel and cross-compare them to perform a regularity test. The aggregated beat information is then used for arrhythmia-based alarm detection.

Kalidas et al utilize only ECG and PPG data for their beat detection (Kalidas and Tamil 2015). For beat detection in ECG waveform they use the Matlab implementation of the well recognized Pan–Tompkins algorithm (Pan and Tompkins 1985, Sedghamiz 2013). Moreover, the PPG beat detection was performed by finding the maximas in the first order differential of the PPG window. The beat-to-beat intervals were then computed with the identified ECG and PPG beats following which correlation of the intervals were performed with a linear time-shift to compensate for the latency between the two waveforms. However, it should be noted that although beat detection is performed for cases of asystole, extreme bradycardia and extreme tachycardia, it was not performed for cases of ventricular tachycardia and ventricular fibrillation. Instead, for the latter two Kalidas et al utilize feature extraction directly from the waveform followed by support vector machines (SVM) classification.

Couto et al (2015) utilize all three waveform channels in their beat detection, i.e. ECG, PPG and ABP. For ECG beat detection, they use the open source solutions gqrs (Zong et al 2003b, Silva and Moody 2014) and epilimited (Hamilton 2002). For detecting beats in ABP and PPG waveforms, they utilize the open source implementation wabp (Zong et al 2003a). Once the beats have been detected in each of the available channels and the waveform quality metrics computed, they then fuse the beats together based on waveform quality to form a unified set of beats for each window. The authors of this method implemented their code in Octave and we were unsuccessful in executing their code on our machines, and therefore we were not able to compare our beat detection methods against theirs.

Fallet et al also utilize ECG, ABP and PPG waveforms for their beat detection. They perform the ECG beat detection using an adaptive mathematical morphology based method (Yazdani and Vesin 2014). For the PPG and ABP waveforms, the authors use a method described in Arberet et al (2013) to detect the beats. Finally depending on the waveform quality index which is computed utilizing both the custom and the default method provided in the challenge, the beat's corresponding heart rate as well as the quality metrics were utilized for arrhythmia detection.

Antink et al perform beat-to-beat interval detection on the ECG, PPG and ABP waveforms (Antink and Leonhardt 2015). However, their interval estimation is based on a self similarity assessment method (Antink et al 2015). Although the heart rate variability as a feature is used and computed, they do not employ any direct beat detection method. Instead their interval estimation is performed using the short-time autocorrelation function. In their feature extraction and estimation, they combine all available ECG channels to form a fused ECG waveform and similarly they fuse the ABP and PPG waveforms to form a single pressure based cardiac signal. Becuase of the non-standard mechanism for their beat detection, we were unable to compare our beat detection mechanism directly against theirs.

Eerikäinen et al perform beat detection on all ECG, ABP and PPG waveforms (Eerikäinen et al 2015). For the ECG channel, they use a low-complexity R-peak detector similar to the method developed by Rooijakkers et al (2012). For ABP and PPG waveforms, the open source pulse detection algorithm wabp was utilized (Zong et al 2003a). After detecting beats on all available channels, the authors compare the identified beats and select those that are consistently found across all channels. These selected beasts are then used for further false alarm detection.

3. Data

The data provided by Physionet challenge 2015 contains five different alarms: extreme bradycardia (BC) defined as heart rate (HR) lower than 40 bpm for five consecutive beats, extreme tachycardia (TC) defined as HR higher than 140 bpm for 17 consecutive beats, ventricular tachycardia (VC) defined as five or more ventricular beats being present and HR higher than 100 bpm, ventricular flutter/fibrillation (VF) defined as fibrillatory, flutter, or oscillatory waveforms for a duration of at least 4 s, and Asystole (AS) defined as the absence of a beat for at least 4 s.

3.1. Beat annotations by human experts

All the beats in the last 16 s preceding each alarm were annotated by human experts. This was done by two independent advanced cardiovascular life support (ACLS) certified investigators and the disagreements between them were adjudicated by an experienced cardiac electro-physiologist. Two leads of ECG along with the ABP and PPG waveforms were presented to the annotators simultaneously using an annotation tool that was developed by the authors. Each annotator selected the beats on the ECG waveform and specified if the beat was Normal, Ventricular, Fibrillatory, Pacemaker or Other.

The timing of the reference beats were selected based on the ECG leads. If both of the ECG waveforms were noisy and the beats were undetectable during a section of the recording, another waveform was used to identify the beats and the detected beats were offset according to the delay between that waveform and the ECG waveforms. The size of the offset was estimated by comparing the location of the beats in periods during which the waveforms were both clean. If both of the ECG waveforms were unavailable or noisy throughout the 16 s window, the beats were selected based on the ABP or PPG waveforms, whichever was available.

Throughout this study, the reference beats were compared to the beats that were automatically detected by different algorithms. To do so, the reference and detected peaks were first converted to indicator signals that represent each beat with a 10 ms wide square pulse, as shown later in figure 5. This was done by creating a signal whose value was equal to one where the peaks were and zero everywhere else. This signal was then convolved with a 9 ms long signal of all ones. Finally, all the samples with a value of zero remained zero and all the samples with a positive value were set to one.

Different beat detection algorithms can choose different fiducial points in the waveform as beats. As a result, a fair comparison between the output of a peak detection algorithm and the reference beats required that the detected and reference beats first be aligned with each other. To do so, the indicator signals for both sets of beats were constructed. Then, the cross-correlation between the indicator signals was computed by varying the lag variable between zero and 2f_s where f_s is the sampling rate. The offset between the two waveforms was determined by finding the maximum value in the auto-correlation function. This led to an alignment size of less than 2 s, ensuring that the reference and detected beats could be effectively compared to each other.

To compute the accuracy of the peak detection algorithms, a range around each reference beat was searched to find the detected beats. These ranges were defined by the boundaries of the reference beats which were determined by finding the midpoint between a given reference beat and the preceding and following reference beats. If the boundary margin on either side of the reference beat was larger than one second, it was reduced to one second. The window around each reference beat was searched and all the detected beats within that range were identified. If no detected beat was found in that range, a false negative was declared. Otherwise, the closest detected beat to the reference beat in the range was identified as a true positive and the remaining detected beats, if any, were marked as false positives. A detected beat that was not within the boundaries of any of the reference beats was also marked as a false positives.

4. Methods

The steps for the alarm suppression algorithm that are used in this paper are shown in figure 1. They are composed of four main parts; preprocessing, beat detection, beat type detection and alarm suppression. The beat detection algorithm searches all four waveforms simultaneously and creates a unified set of beats using the information from all the waveforms. Then, the ventricular and fibrillatory beats are separated from other beats using classification models. Finally, criteria are applied to the detected beats for each alarm type to determine whether the alarm is true or false.

5. Preprocessing

The first step in preprocessing is to retrieve the input waveforms. We retrieve waveforms from ECG II, ECG V, ABP, and PPG. We then select the last 16 s of those waveforms before the alarms. If any of the two ECG leads are not available, the other leads are used instead. If a waveform is not available, all of its values are set to zero. Finally, all the waveforms are resampled to 125 Hz.

5.1. Beat detection

The beat detection method that is used in this work is an ensemble algorithm that uses a multitude of beat detection algorithms including already existing algorithms as well as several algorithms that are designed by the authors. Different peak detection algorithms are used for different types of waveforms and the detected beats are aligned to correct for the delay that exists between the ECG, ABP and PPG waveforms. At the end, a classification model is used to find the true beats based on the output of different beat detection algorithms. The following sections describe the waveform conditioning steps and the beat detection algorithms for each of the waveforms.

5.1.1. Signal conditioning.

The first step for beat detection is conditioning the waveforms. Figure 2 shows a portion of the ECG II waveform and different steps of the conditioning process. First, a first order zero-phase Butterworth filter is used to remove the noise from the waveform. The cut-off frequencies for the ECG waveform are 0.5 Hz and 40 Hz while the cut-off frequencies for the ABP and PPG waveforms are 0.5 Hz and 10 Hz. Then, the waveform is centered by removing the median of the waveform. Next, a 250th-order median filter (2 s wide span) is used to find the trend of the waveform. After removing the trend, pacemaker spikes are found and removed from the ECG waveforms.

The ECG waveforms are first normalized by dividing the waveform by an estimate of the peak heights. The average height of the peaks is estimated by finding an initial set of peaks whose absolute height is larger than the 80th percentile of the absolute waveform and are at least 2 s apart. The median of the absolute height of these peaks is used as the estimate. The results of filtering, detrending and normalization are shown in figure 2(b). Then, the waveform is filtered using a 3rd-order median filter (figure 2(c)). The 2.4 ms span of this filter heavily diminishes the pacemaker spikes, which are very sharp and narrow, but does not impact the R waves since they are 60–100 ms wide. A peak is considered a pacemaker spike if the absolute amplitude after median filtering is reduced by at least 50% and the absolute residual is larger than 0.2. The pacemaker spikes are removed using a 5th-order median filter applied to a range around the spike that is 5 samples wide on either side. Figure 2(d) shows the ECG waveform after spike removal.

Finally, the waveforms are normalized again after the pacemaker spikes are removed. This is done by estimating the average height of the beats and dividing the waveform values by that estimate. The average beat height is estimated by finding an initial set of peaks such that the height of the peaks is larger than the 99th percentile of the absolute value of the waveform and the minimum distance between the peaks is at least 2 s. The 2 s distance between the peaks results in choosing a set of peaks that represent different regions of the waveform and ensures that the estimate is not heavily impacted by noise areas of the waveform.

5.1.2. Beat detection: ECG signal.

Seven different beat detection algorithms are used in this work for the ECG waveforms including the sqrs routine in the Physionet toolbox which implements the algorithm in Engelse and Zeelenberg (1979) (ECG-PD1). The second algorithm, ECG-PD2, is peak detection code by Sadeghi et al that is available on Matlab central (Sedghamiz 2013). Five more beat detection algorithms have been designed by the authors for the ECG waveform.

The first algorithm, ECG-PD3, uses the derivative of the ECG waveform to find sudden changes in the amplitude of the waveform that correspond to a QRS complex. Different steps of this algorithm are shown in figure 3. The algorithm starts by finding the absolute value of the derivative of the ECG waveform, illustrated in figure 3(b), and estimating the average height of the peaks in the derivative waveform that correspond to the ECG beats. This is done by finding an initial set of peaks whose height is larger than the 95th percentile of the absolute derivative waveform and are at least 2 s apart. It also finds a set of initial troughs by finding the lowest points between consecutive pairs of peaks. Then, the algorithm computes the prominence of each peak, defined as the average vertical distance from the peak to the neighboring troughs. A threshold value is computed that is equal to the median of the prominence values of the peaks divided by four. Finally, the peaks of the absolute derivative waveform are found such that the prominence of the peaks is larger than the threshold and the distance between them is at least 100 ms. The threshold and the detected peaks are shown in figure 3(c). This algorithm finds the narrow peaks that are often observed in a normal QRS complex. The next algorithm, ECG-PD4, is similar to ECG-PD3 but operates on the absolute value of the ECG waveform itself after it is filtered between 0.1 Hz and 10 Hz as opposed to its derivative. This algorithm detects wider peaks similar to those observed in PVC beats.

**Figure 3.** Peak detection steps for the *ECG-PD3* algorithm. (a) shows the original ECG waveform; (b) shows the absolute values for the derivative of the ECG waveform along with the initial peaks and troughs that are used to normalize the derivatives; (c) shows the prominence threshold that is used for peak detection along with the detected beats; and (d) shows the original ECG waveform along with the detected beats before and after adjustment.
Download figure:
Standard image High-resolution image

Three other algorithms (ECG-PD5—7) have been devised that operate on the Stockwell transform of the ECG waveforms. Different steps of these algorithms are depicted in figure 4. The Stockwell transform creates a two dimensional representation of the waveform in the time-frequency domain, which is shown in figure 4(b). Each of the three algorithms look at a band of frequencies and create a representative waveform, shown in figures 4(c)–(e), by averaging the frequency components along the frequency axis within that band. The frequency bands are defined from 1 Hz to 8 Hz, 8 Hz to 25 Hz and 25 Hz to 40 Hz. The green horizontal lines in figure 4(b) correspond to 8 Hz and 25 Hz frequencies that define the three frequency bands. Finally, a beat detection procedure similar to ECG-PD4 is applied to the averaged waveforms to find the beats.

**Figure 4.** Peak detection steps for *ECG-PD5—7* algorithms. (a) shows the original ECG waveform; (b) shows the time-frequency representation of the ECG waveform that is generated by the Stockwell transform along with the horizontal dashed lines corresponding to 8 Hz and 25 Hz; (c) shows the average of frequency components from 1 Hz to 8 Hz and the detected beats (*ECG-PD5*); (d) shows the average of frequency components from 8 Hz to 25 Hz and the detected beats (*ECG-PD6*); and (e) shows the average of frequency components from 25 Hz to 40 Hz and the detected beats (*ECG-PD7*).
Download figure:
Standard image High-resolution image

Each beat detection algorithm is followed by a beat adjustment step. A range within 50 ms of the peak in the original ECG waveform is searched and the peak with the largest amplitude in absolute value is selected as the beat. If no such point exists, the sample with the highest absolute value is chosen as the beat.

5.1.3. Beat detection: ABP and PPG signals.

Three beat detection algorithms are used to find the beats for the ABP and PPG waveforms. The first one uses the wabp routine in the Physionet toolbox (ABP-PD1 and PPG-PD1). For the ABP waveform, the original waveform is used without normalization and scaling, i.e. the waveform before the conditioning step is used. This is because the wabp routine takes advantage of the features of the ABP waveform such as systolic and diastolic blood pressures. However, the PPG waveform values do not correspond to pressure values. Hence, we scale the PPG waveform between 80 and 120 to resemble an ABP waveform before using the wabp routine to find the beats.

The other two beat detection algorithms (ABP-PD2—3 and PPG-PD2—3) are developed by the authors and are similar to ECG-PD3 and ECG-PD4, except that the waveform values are used directly and without computing the absolute values of the waveform or its derivative. This is due to the fact that the beats of the ABP waveform are always positive as opposed to the ECG waveform for which the beats can be inverted.

Finally, the detected beats need to be adjusted to reduce the variability among the beats that are detected by different algorithms. This is similar to the ECG beat adjustment that was introduced at the end of the previous section. However, the ABP and PPG waveforms experience a slower rise during systole compared to the sharp changes in the amplitude of the ECG waveform during the QRS period. Moreover, most ABP and PPG beat detection algorithms, including the ones proposed in this paper, choose a point during the systole period, i.e. they pick a point prior to the systolic peak. As a result, the range that is searched to find the peak with the highest amplitude starts from 50 ms before the beat and ends at the location of the following beat or 1 s after the current beat if the distance between the current beat and the following beat is longer than 1 s.

5.1.4. Beat fusion.

Several different beat detection algorithms were introduced in the previous sections for the ECG, ABP and PPG waveforms. This results in a total of 20 sets of beats including seven sets of beats for each of the ECG leads and three sets of beats for each of the ABP and PPG waveforms. The four input waveforms and their indicator signals are shown in figure 5. For the ECG waveforms, seven indicator signals are shown in the right column (figures 5(a2) and (b2)) corresponding to ECG-PD1—ECG-PD7. Figures 5(c2) and (d2) depict the indicator signals for the ABP and PPG waveforms corresponding to ABP-PD1—ABP-PD3 and PPG-PD1—PPG-PD3, respectively.

**Figure 5.** ECG, ABP and PPG waveforms and the indicator signals for the detected peaks. The left column, (a1)–(d1), shows the original waveforms while the right column, (a2)–(d2), shows the indicator signals for each of the beat detection algorithms.
Download figure:
Standard image High-resolution image

Next, the outputs from different beat detection algorithms have to be combined to create a final set of beats. To do so, the ABP and PPG waveforms need to be first aligned with the ECG waveforms to account for the delay that exists between the waveforms. This is done by adding together all the indicator signals that belong to the same waveform to create one indicator signal per waveform. Then, the two indicator signals for the two leads of ECG are added to create a single indicator signal for ECG. The ABP and PPG waveforms are then aligned with the ECG waveform using cross-correlation between their indicator signals. The original indicator signals are also shifted according to the lags between the waveforms.

After aligning the waveforms and the indicator signals, a set of candidate beats is determined by adding the indicator signals together to create a single ensemble indicator signal. All the 20 indicator signals and the ensemble indicator signals that result from the summation are shown in figures 6(a) and (b), respectively. Then, all the beats in the ensemble indicator signal are identified such that the minimum distance between them is at least 100 ms and the height of the beats is at least 1.5 for the AS alarms or 2.5 for the rest of the alarms; the beats that do not meet these criteria are discarded. The candidate beats are shown in figures 6(b) and (c).

**Figure 6.** The process of finding the beat candidates and the final beats. (a) shows all the 20 indicator signals that represent the output of all the beat detection algorithms stacked over each other; (b) shows the sum of all the indicator signals and the detected beat candidates; (c) shows the original ECG waveform along with the beat candidates; and (d) shows the final beats after the false positives that are detected by the decision tree model are removed.
Download figure:
Standard image High-resolution image

The set of candidate beats includes almost all of the true beats along with several false positives. To determine the likely true beats, a decision tree model is used to classify the true and false beats based on the decisions that are made by the peak detection algorithms. For each beat, the values from all of the original indicator signals, shown in figure 6(a), are combined to create the input variables for the classification model. Hence, the input consists of 20 binary variables including seven values for each ECG waveform indicating whether a given peak was detected by each of the ECG beat detection algorithms (ECG-PD1—7) as well as three values for each of the ABP and PPG waveforms indicating the decisions made by the peak detection algorithms (ABP-PD1—3 and PPG-PD1—3). The output from the decision tree is a binary variable indicating whether the beat is true or false.

A separate decision tree model is trained to classify the true and false beats for each alarm type. For training, a five-fold nested cross-validation approach is used, i.e. the records are divided into 5 folds and the cross-validation method uses one of the folds as testing data during each iteration. The remaining data is also divided into five parts where one of the parts is used for validation and the rest are used for training. The parameter optimization for the classification model is done by training the model on the training dataset using different values for model parameters and testing the model performance on the validation set. In this case, the model parameter for the maximum number of decision splits is varied from 100 to 1500 with increments of 50. Then, the best performing model is chosen and tested on the testing dataset. The cross-validation method iterates over the testing and validation folds, ensuring that the classification model is tested on a subset of data that is not seen by the model during the training phase.

Next, the beat detection algorithm uses the classification model to veto the detected beats and to only keep the true ones. This approach is more effective than the conventional polling algorithms since it allows for different beat detection algorithms to be weighted differently according to their accuracy, sensitivity and specificity. It also allows for the different beat detection algorithms to group together to detect a particular type of beat. Furthermore, the data is balanced by replicating the instances from the group with the smaller number of cases. Also, the observations are weighted to increase the cost of misclassification for the abnormal beats that define an alarm, i.e. the ventricular and fibrillatory beats are weighted five times higher for the VT and VF alarms, respectively.

5.2. Detecting the type of the beat

After beat detection, the type of beats needs to be determined for the VF and VT alarms. To do so, the waveforms are first conditioned as described in section 5.1.1, except that the pacemaker spikes are not removed in this case. This is due to the fact that spike removal can alter the pattern of the normal beats that are initiated by the pacemaker and result in a waveform that resembles the VT beats. A set of features are extracted from a range around each beat. First, the raw waveforms within 100 ms of the beat on either side is used as features. This is done for every waveform. For the two ECG leads, if the amplitude of the beat is negative, the features from that waveform are inverted. Moreover, each waveform is transformed into the time-frequency domain using Stockwell transform. For each beat, the frequency components are averaged along the time axis within a window extending from 100 ms before to 100 ms after each peak, creating a frequency representation for the waveform in a range around that beat. These frequency representations are used directly as features as well. The extracted features for the VF and VT alarms are shown in figures 7 and 8, respectively.

**Figure 7.** The features used to classify the fibrillatory beats of the VF alarm. The waveforms in blue and red show the average patterns for the normal and VF beats, respectively. The time features in the top row show the average patterns in a 100 ms range around the beats while the frequency features in the bottom row show the average frequency components in a 100 ms range around the beats.
Download figure:
Standard image High-resolution image

**Figure 8.** The features used to classify the ventricular beats of the VT alarm. The waveforms in blue and red show the average patterns for the normal and VF beats, respectively. The time features in the top row show the average patterns in a 100 ms range around the beats while the frequency features in the bottom row show the average frequency components in a 100 ms range around the beats.
Download figure:
Standard image High-resolution image

Two separate classification models are used to distinguish between normal and abnormal beats in the VT and VF records. First, the data is balanced to have an equal number of normal and abnormal beats using replication, i.e. a randomly selected subset of the abnormal beats is replicated to create a balanced dataset. Then, a decision tree is trained using five-fold nested cross-validation varying the maximum number of decision splits between 100 and 1500 with increments of 50. For the VT model, the response variable indicates whether the beat is ventricular or not, while the response variable for the VF model indicates whether the beat is fibrillatory or not.

5.3. False alarm detection

The final step in false alarm suppression is to apply a criteria on the detected beats in the previous steps to determine whether the alarm is false or not. These criteria are often less strict than the definition of the alarm in order to prevent excessive false negatives and to account for possible missing beats. Two of the alarm types, VT and VF, need both the timing of the beats and their types, determined using the method described in the previous section, to differentiate between true and false alarms. The criteria used to determine the status of the alarms for each of the alarm types is explained below.

An AS alarms is determined to be true if there is at least one pair of consecutive beats which are at last 2.5 s apart from each other. Additionally, the beat detection algorithms can fail during asystole by detecting elements of the baseline noise or the artifacts as peaks. When the asystole waveform contains only high-frequency components that are associated with the baseline noise, the elements of the noise that are incorrectly detected as peaks can lead to unusually high HR values. On the other hand, when the asystole waveform is contaminated by artifacts, such as motion artifacts, the artifact can be incorrectly detected as a QRS complex, leading to extremely low HR values. Hence, an AS alarm is reported to be false only if the detected HR is within the range of normal HR values, i.e. the alarm is declared to be false when the HR is between 20 and 140 bpm; otherwise, a true alarm is declared. A BC alarm is true if there are at least 4 consecutive beats with a HR of lower than 45 bpm. If the HR is higher than 135 bpm for at least 14 beats, a TC alarm is declared to be true. A VF alarm is marked true if there exist a 3 s long window that contains at least 4 fibrillatory beats. Lastly, a VT alarm is declared to be true if there exists a 5 s long window that contains at least 2 ventricular beats.

6. Results

This section presents the results of the machine learning steps described in the previous sections as well as the final false alarm detection results on the training and testing sets. Moreover, a comparison between the beat detection algorithms that are used in the top scoring submissions along with the proposed beat detection algorithm is presented.

The beat detection algorithm uses multiple existing beat detection algorithms as well as several other methods that are proposed by the authors to find beats in the ECG, PPG and ABP waveforms. Then, the beats are aligned together and the output of the beat detection algorithms are fed into a classification algorithm to distinguish between the true and false beats. A separate model has been trained for each type of alarm using five-fold nested cross-validation and the results are shown in table 1. As mentioned earlier, the observations that correspond to abnormal beats were weighted five times higher than the observations that are associated with the normal beats in the VF and VT models. This can explain the high levels of sensitivity for these two types of alarm in table 1. Since these abnormal beats are rare, their detection is critical for the correct classification of true and false alarms. These high sensitivity levels lead to improved performance for false alarm suppression.

Table 1. Performance of the decision tree model for classifying the true and false beats using 5-fold nested cross-validation.

Alarm	Accuracy (%)	Sensitivity (%)	Specificity (%)
AS	87.10	84.97	89.21
BC	90.29	90.49	90.05
TC	97.23	96.55	97.80
VF	78.35	92.40	61.64
VT	95.99	96.63	95.47

The detected beats were used to classify the ventricular beats in the VT records and the fibrillatory beats in the VF records. Two classification models were trained using five-fold nested cross-validation and the accuracy, sensitivity and specificity were 98.32%, 99.59%, 97.06% for the VF records, respectively. Similarly, the accuracy, sensitivity and specificity for the VT records were 97.77%, 99.89%, 95.55%, respectively. The high levels of sensitivity will prevent the algorithm from missing abnormal beats that are crucial for detecting true alarms.

The results for the false alarm suppression using the training and the hidden test data are shown in table 2. The overall score of 89.1 and 76.2 were obtained for the training and test datasets, respectively. Moreover, a score of 73.4 was obtained for the retrospective event which uses 30 s of the waveforms after the alarm in addition to the 16 s before the alarm. The lower real-time score during the testing phase is mainly driven by the low score for the VT alarm which constitutes a large portion of the dataset. The large difference between the training and testing scores for the VT alarm could be the result of over-fitting to the training data or substantial discrepancy in the distribution of the training and test datasets. Moreover, the low TPR value for the VF alarms during the testing phase can lead to a large number of true alarms being suppressed. This result can be explained by the scarcity of the VF records in the dataset. The scores for the other three alarms range from 79 to 94 which shows the effectiveness of the proposed algorithm in suppressing AS, BC and TC false alarms.

Table 2. The performance of the false alarm suppression using the training and hidden test data for individual alarm types.

	Training			Testing
Alarm	TPR(%)	TNR(%)	Score	TPR(%)	TNR(%)	Score
AS	95	86	84.9	100	85	86.4
BC	98	88	89.2	95	79	79.0
TC	98	67	91.2	99	60	93.9
VF	50	100	78.6	33	96	61.0
VT	97	94	91.8	74	85	67.6

Real-time	97	92	89.1	89	85	76.2
Retrospective	—	—	—	88	84	73.4

6.1. Accuracy of beat detection methods

The beat detection algorithms from the top scoring submissions were studied to compare their performance to the one presented in this paper on the training dataset. We were not able to obtain the beats from two of the submissions. Couto et al implemented their method in Octave. Several attempts to run the algorithm failed. Also, Antink et al found the interval between the beats instead of the beats themselves; hence, the beat locations are not available. The code for the other four submissions were executed and the detected beats were saved and compared to the human expert annotations. All the submissions performed their analysis on at least the last 10 s prior to the alarm. Therefore, this period was used to compare the detected peaks. The results are shown in table 3.

Table 3. The accuracy of beat detection algorithms used in the top scoring submissions and the proposed beat detection algorithm on the training dataset.

Performance measure	Beat type	Plesinger	Kalidas^a	Fallet^b			Erikainen	Ansari^c
Performance measure	Beat type	Plesinger	Kalidas^a	ECG	ABP	PPG	Erikainen	Ansari^c
Sensitivity	All	86.98	89.99	53.03	37.15	71.47	84.01	94.52
Positive predictability	All	96.17	92.77	93.65	90.80	84.74	95.88	90.97
Sensitivity	Normal	89.66	89.80	53.03	35.90	74.47	85.68	94.92
Positive predictability	Normal	95.77	92.58	93.22	89.32	83.54	95.41	89.87
Sensitivity	Abnormal	68.02	97.19	53.06	46.06	50.14	72.13	91.70
Positive predictability	Abnormal	70.87	25.93	50.00	60.12	32.45	71.21	54.77

^aKalidas et al does not perform peak detection on VF and VT records; hence, the reported accuracies are the average for the AS, BC and TC alarms only. ^bFallet et al perform peak detection on each waveform separately. The accuracy of peak detection from each waveform is reported. ^cThe beat detection algorithm that is proposed in this paper uses the reference human annotated beats for training the machine learning models. Hence, the reported accuracies should be intrepreted and compared to the other algorithms with caution.

No one method performed better than all the other methods. Instead, they demonstrate different levels of trade-off between sensitivity and positive predictability. Nonetheless, the proposed algorithm is among the top-scoring methods, especially in terms of sensitivity. However, these results should be interpreted with caution. The proposed algorithm benefited from the human expert annotations which were used for training the classification model. This information was not available for the other algorithms. Moreover, it can be seen that the proposed algorithm has higher sensitivity in almost all the categories. This is due to the higher weights that were given to the positive instances during the training of the classification algorithm which resulted in a more sensitive model, avoiding false negatives due to missing beats.

7. Conclusions and future works

The presented model for suppression of false alarms is composed of three main components, detecting the beats, determining the type of beats (for VF and VT alarms) and using an alarm-specific criteria to determine whether the alarm is false. The results for the beat detection algorithm show high sensitivity levels for most of the alarms, ensuring that the beats that are essential for detecting the true alarms are not missed. However, the specificity of the beat detection algorithms for the VF and VT alarms could be improved. Moreover, the results show a high performance for the AS, BC and TC alarms, but lower accuracy is achieved for the VF and VT alarms. As a result, the alarm suppression algorithm for these two alarms should be further improved. Finally, a noise detection algorithm can be used to detect the corrupted portions of the waveforms. These portions should be excluded from the analysis to improve the accuracy.

Suppression of false arrhythmia alarms in the ICU: a machine learning approach

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. Top-scoring beat detection algorithms