Classification of Electrocardiogram Patterns
Cardiovascular diseases is one of the leading causes of mortality in both developed and developing states; singlehandedly responsible for nearly 801,000 deaths in the US alone (Benjamin et al., 2017). The World Health Organization suggests people associated with cardiovascular diseases should have early detection and well-crafted health management plan from cardiologists (WHO, 2016). Amongst the diseases of cardiovascular origins, atrial fibrillation is the most common arrhythmia (Medi et al, 2001). Patients experiencing atrial fibrillation are reportedly to manifest a wide array of complications such as haemodynamic stability, cardiomyopathy, cardiac failure, and embolic events (Chamberlain, 1980). A common method used in cardiovascular disease and atrial fibrillation early diagnosis is the measurement of electrocardiogram (ECG), possessing several advantages such as low cost and absence of body invasiveness.
Electrocardiogram (ECG) is the electric signal originating from myocardial cells within heart tissues, being written down to specific type of paper called “ECG paper”. The signals, when display in written form, have the wave-like morphology exclusive for a certain cardiac physiological state and thus cardiologists with trained eyes can interpret and diagnose the disorder if it exists. Having said that, cardiologists from varied clinical background, such as medical students, could infer from ECG differently in terms of competency (Little B, 2004). Automated reading and interpretation of ECG done by specialized softwares could help alleviating said issues.
The problem of ECG pattern recognition and disorder classification have been done by various researchers globally, using the approaches of neural networks (Andrew et al., 2016), support vector machines (Rabee and Barhumi, 2012) and genetic algorithms (Li et al., 2017). Currently, said approaches suffer from moderate to high error rates compared to human interpretation, as 50 % of non-sinus rhythms are mislabeled by computers (Shah & Rubin, 2007). An algorithm effective at predicting ECG patterns should be able to recognize distinct wave type amidst wave morphological variability between patients and presence of waves. We chose the Neural Network based on our present understanding and experience and literature evidence of high efficiency (up to 95%) (Jambukia et al., 2015). The trained neural network would automatically predict given ECG signals and label the corresponding class of abnormalities.
We develop a model of Convolutional Neural Network (CNN) to detect electrocardiogram abnormal and normal patterns in variable length ECG sequential data. In addition, the network is capable of learn and classify 4 classes of cardiological beats present in the data. The model is trained with the input of single-lead ECG signals sampled at 300 Hz and associated set of annotation per ECG segment as supervision.
In contrast of traditional machine learning methods such as Multiclass Support Vector Machine (SVM) or Multiple Layer Perceptron (MLP), which depend on extraction of features prior to training, Convolutional Neural Network, or CNN, utilizes the dataset to simultaneously extract features and train upon them. This difference is extremely crucial to our project as feature engineering is laborious by its nature and a great body of expertise of electrocardiogram analysis and interpretation is required.
Materials and Methods
1D Convolutional Neural Network
Convolution Neural Network is a hierarchical neural network whose convolutional layers interplexed with pooling layer, closely resembling of how human perceive images (Wiesel and Hubel, 1959). Convolution layers extract features out of raw input using filters to compute dot product of filter and filter-length window of pixels of the input image whilst later layer further train upon newly mined features (LeCun, 1998). Convolutional neural network are now commonly used for the “deep learning” tasks such as object recognition in large image achieves whilst achieving the state-of-the-art performances ()compared to traditional neural network which are found to not effective at said task (Menendez, 2000). Artificial neural network does not take into topology of image into account, treating them as raw inputs. This lead to higher cost of computation and low accuracy of prediction (Jun et al, 2018). Convolutional Neural Network can take advantage of such correlation of spatially adjacent pixels by extracted using a nonlinear filter and by applying multiple filters, it is possible to extract various local features of the image. The reason why we applied 1D CNN by converting the ECG signal into ECG time-series form in this paper is that 1D convolutional and pooling layers are more suitable for filtering the spatial locality of the ECG time-series. As a result, higher accuracy of ECG arrhythmia classification can be obtained. In addition, the physician judges the arrhythmia in ECG signal of the patient through vision treatment through eyes. Therefore we concluded that applying the 1D CNN model to the ECG image is most similar to the physician’s arrhythmia diagnosis process.
While convolutional neural network is concerned mainly for 2-dimensional image classification, 1-dimensional neural network is proved to have enormous potential at giving predictions for specific types of data, such as time-series and sequential data (Kiranyaz et al, 2016). 1D convolutional neural network can be used for extracting local 1D patches (subsequences) from sequences and able to identify local patterns within the window of convolution. Similar to traditional neural network, 1D convolutional neural network use kernel to automatically extract features but the shape is different; kernels used for 1D network is 1 dimensional. Subsampling operations are conducted more careful than traditional neural network as broader pooling could result in loss of crucial sparse connectivity between datapoints. Unlike multiple layer perceptrons (MLP), convolutional neural network in general and 1-dimensional variant in particular, network weight matrices are shared between neurons, thus reducing substantially training time.
We utilize ECG recordings from PhysioNet/Computing in Cardiology 2017 Challenge for both training and testing purpose. The dataset originally aims to detect atrial fibrillation from normal cardiological rhythm but has been expanded to include several other abnormalities. The recordings were collected through AliveCor’s single channel (lead I) ECG device, which digitized the data in real time at 44.1 kHz. The digitized data were then stored at a sampling rate of 300 Hz with 16-bit resolution, and a bandwidth between 0.5- 40 Hz. The training set consisted of 4 categories of rhythms to classify with the class distribution as 5,050 normal, 738 atrial fibrillation, 2,456 other, and 284 noisy. The hidden testing dataset, by which containing 3,658 recordings is used to evaluate efficiency of classification model. Data length varies between minimum 9 seconds to maximum 64 seconds. In order to control the dimension as required by 1D convolutional neural network, we attempt to repeat the signal of individual segment until the maximum of total recordings in the dataset. For example, if a time series datum is of 10 segments of rhythm, we would repeat it 6 times. Since the data is 1-dimensional, the method of zero-padding was avoided to minimize loss of dimension during initial stages of convolution.
We processed raw ECG waveforms by using Biosignal Processing Library (Carreiras et al, 2015) to obtain time-series data. First, we perform an extraction of ECG signal values from MATLAB V4 WFDB-compliant format files. There are 8,528 files, each asssociated with one recording. A small Python script was used to complete the extraction with end result is a file of 8,528 row of 9,000 time-series value. Figure 2 display a few of them in column format. We applied an operation of robust scaling normalization, a function provided by SciKit-Learn library , to standardize the dataset for lessening burden from intensive number crunching computations.
Model Architecture and Training
We use a 1-D convolutional neural network to classify sequences of electrocardiogram. The network was fed with a time-series of preprocessed ECG signal and output a sequence of output labels. We arrive at an architecture of 10 layers of convolution followed by a fully connected layer with a softmax at the end.
We initialized values of kernel matrices with normal distrib.The convolutional layers all have a filter length of 8 and have 16k filters where k is initialized at 1 and increases every convolution layer. Optimal batch size was chosen in the range of 1, 5, 10, 20, 32, 64, and 128. A wide range of epochs, from 10 to 1000 were evaluated using small batch training to obtain most performant one. Taking into account of notes by previous experiments (Rajpunkar et al, 2017), mislabeling of rhythms similar in waveform topologies such as Atrial Fibrillation (AFIB) and Atrial Flutter (AFL) which is understandable given that they are all atrial arrhythmia would increase the loss function. We decided to train the network using the scheme of 70 % training – 30 % validation split (Kohavi, 1995) and a 5-fold cross-validation. We separate the amount of time series into 10 segments. 9 of them are used to train the model until We used the Adam optimizer (Kingma & Ba, 2014) with the default parameters that reduce the training time dramatically (up to 10x) when the validation loss stopped improving.
We fed the model with a file containing 9,000 sequences of normalized time-series data. The input experienced convolution at the first layer and subsequently underwent batch normalization for slight regularization effect and modest improvement of activation efficiency. The output of aforementioned operation are transformed by a Rectified Linear Unit (ReLU) to introduce non-linearity to the model to closely model the network after real-life relationship of various factors concerning electrocardiogram rhythmic patterns. We also added a layer of Dropout to increase the degree of generalization and reducing risk of overfitting. Results that arrive at the Fully Connected (FC) layer are processed to yield classification scoring profiles for all rhythm classes. The layer of Softmax receives rhythm class score matrices and return a distribution of probability per classes where class of highest probability was the predicted class by the model. The predicted value continued to be evaluated against ground truth label to determined cross-entropy error. The error were then propagated throughout the network to update weights and biases of kernels in convolutional layers and fully connected layers. This process iterated over until desired epoch was met.
We use a convolutional neural network built on the basics of Python’s Numpy (Numpy Community) and Scikit-Learn (Pedregosa et al., 2011) external libraries and APIs from CAFFE deep learning framework (Jia et al, 2014) for the signal prediction task. A cluster of 3 commodity desktop computers were set up and used for training the network in 4 months. The computers, denoted as (1), (2) and (3) , were equipped with Intel Pentium Core 4 3.0GHz and 2GB DRAM2. All frameworks and computations are conducted in Ubuntu Linux environments connected by Python language.
We used two following metrics to evaluate model accuracy of prediction, using existing rhythm annotations given by the dataset as the ground truth.
Sequence Level Accuracy (F1): We measured the average overlap between the prediction and the ground truth sequence labels. For every sequence, a model was required to make a prediction approximately once per second (every 8,528 samples). The predictions were compared against the ground truth annotation and assigned either of two values “Matched” or “Mismatched”.
Set Level Accuracy (F1): Instead of treating the labels for a record as a sequence, we consider the set of unique arrhythmias present in each 60 second record as the ground truth annotation. Set Level Accuracy, unlike Sequence Level Accuracy, does not penalize for time-misalignment within a record. We report the F1 score between the unique class labels from the ground truth and those from the model prediction.
In both the Sequence and Set case, we computed F1 scores for each class separately. The F1 scores were then considered to be class-frequency weighted mean.
Accuracy in the mid-80’s was generally achieved within 45 epochs, the training generally terminated after 80 epochs, since we had implemented an early stop criterion of 40 epochs without a change in validation accuracy. After 40 epochs the train and test loss begin to diverge, with the test loss plateauing indicating slight overfit in the classification. We also implemented a penalty function where incorrect prediction in class 2 and 3 would result in a greater loss.
There were several rhythms labeled as “other” in the dataset were predicted by the model to be Atrial Fibrillation while they were not. We expected that this could be attributed to the provider of dataset mislabeled the sequential rhythms as “other” while they were actual Atrial Fibrillation.
The result demonstrated a great complexity lay behind the decision of selecting approriate hyperparameters for above-average accuracy in electrocardiogram abnormality classification. Compared to alternative methods that are based on efficiency of stage of feature extraction to further improve upon overall performance, convolutional neural network used convolutional layers for feature mapping, thus allleviating both cost of computation and training time in exchange of proper hyperparameters. This required a good understanding of convolutional neural networks in general and 1D convolutional neural network in particular.
Throughout the learning process, we identified some key practices, which allowed us to optimize the model while minimizing complexity. We found that during preliminary training on certain datasets, increasing layers also increased the validation loss. However, in other cases, increasing layers resulted in improvements to performance, however, up until a certain threshold after which the performance decreased, as observed in the raw data analysis. This decrement can be attributed to the degradation effect where accuracy is reduced with increasing depth of the network after reaching a maxima (He et al, 2016). Traditionally, filter allocations would begin with a small number of filters in the lower levels for basic feature extraction and then it would be incremented in each additional layers or batches of layers corresponding with the increasingly complex and class-specific features. We found that with raw ECG data, said approach provided the best balance between training time and model performance. Allocating a small filter count at the low level and then rigorously increasing in the higher levels, improved training time while also providing improved performance.
There are several limitations to both of our approaches and execution. First, the dataset is small (8000s) compared to other dataset used in production by corporations such as Google and Microsoft. Large amount of data would provide the model with real life distribution of all classified instances and warrant the highest accuracy when apply to unknown inputs. It should be noted that accuracy of a network would peak after sufficient training iterations and degrades afterward even more data have been fed into. Secondly, 1-dimensional Thirdly, we trained our network using CPUs, making the training time more extensive. In future researches, we would like to harness the computing power of GPU for the purpose of training.
Menendez, Anne. Image recognition using neural networks. Society of Manufacturing Engineers, 2000.
Kiranyaz, Serkan, Turker Ince, and Moncef Gabbouj. “Real-time patient-specific ECG classification by 1-D convolutional neural networks.” IEEE Transactions on Biomedical Engineering 63.3 (2016): 664-675.
D. H. Wiesel and T. N. Hubel, “Receptive fields of single neurones in the cat’s striate cortex,” Journal of Physiology, vol. 148, pp. 574–591, 1959.
LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86, no. 11 (November 1998): 2278–2324. https://doi.org/10.1109/5.726791.
Kohavi, Ron. “A study of cross-validation and bootstrap for accuracy estimation and model selection.” Ijcai. Vol. 14. No. 2. 1995.
Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” arXiv preprint arXiv:1502.03167 (2015).
Jia, Yangqing, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. “Caffe: Convolutional Architecture for Fast Feature Embedding.” ArXiv:1408.5093 Cs, June 20, 2014. http://arxiv.org/abs/1408.5093.
Pedregosa, Fabian, et al. “Scikit-learn: Machine learning in Python.” Journal of machine learning research 12.Oct (2011): 2825-2830.
Developers, NumPy. “NumPy.” NumPy Numpy. Scipy Developers (2013).
Medi, C., et al. “Pulmonary vein antral isolation for paroxysmal atrial fibrillation: results from long?term follow?up.” Journal of cardiovascular electrophysiology 22.2 (2011): 137-141.
Chamberlain, Ernest Noble, and Colin Ogilvie. Chamberlain’s symptoms and signs in clinical medicine: an introduction to medical diagnosis. Wright, 1980.
He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.