Convolutional Neural Network for Cervical Cancer Prediction

A 22.2 MILLION Parameter neural network to detect cervical cancer with 95.6% accuracy and 88.2% recall using cell images of smear screens

View the Code

Get Training Data

Get External Data

Download the Model (85MB)

Mwakigonja, A.R., Torres, L.M.M., Mwakyoma, H.A. et al. Cervical cytological changes in HIV-infected patients attending care and treatment clinic at Muhimbili National Hospital, Dar es Salaam, Tanzania. Infect Agents Cancer 7, 3 (2012). https://doi.org/10.1186/1750-9378-7-3

Introduction

Cervical cancer is one of the most common malignant tumours in the world, and it is the fourth leading cause of cancer in women. (Siegel RL, 2019) The Pap Smear test is the most economical, non-invasive, easy to perform screening test, and has therefore been adopted globally as the leading cervical cancer screening method. (Sachan PL, 2018) Despite its widespread use, cervical cancer screening exhibits low diagnostic sensitivity and specificity. (Schiffman M, 2018)

Therefore, the aim of this project was to adopt growing convolutional neural network (CNN) methods applied in other cancer screening methods, and develop a novel, robust CNN to predict cervical cancer from cellular smear images. The 22.2 million parameter model was developed using TensorFlow and Keras, and uses cutting-edge deep learning methods for cellular image classification.

Model Architecture and Design

By nature of the CNN, three convolutional layers were developed, applying 32, 64 and 32 filters respectively for automated feature extraction. Each filter used a kernel size of 3x3, a stride of 1, and implemented ReLU activation to adjust for non-linearity. Max-pooling was used in each convolutional layer to down-sample the feature maps. The flattened feature vector was then fed into a 256-neuron fully connected dense layer, using ReLU activation and a sigmoid activation function to predict cervical intraepithelial neoplasia/malignancy vs normalcy.

Training, Validation and Internal Testing

To train the 22.2 million parameter 3-CONV-layered CNN, the 2005 version of the DTU/Herlev Pap Smear Database was taken, sourced from Dr Jan Jantzen and Dr MD Beth Bjerregaard from the Herlev University Hospital (Denmark). Images of single-cell squamous cervical epithelial cells provided from the database were organised by morphological appearance, including: carcinoma in situ, light/moderate/severely dysplastic, or normal. Carcinoma and severely dysplastic cells were collated as positive controls for model training, resulting in 347 case images. Normal superficial, intermediate, and columnar cells were collated as negative controls for model training, resulting in 242 control images. Images were resized to 256x256 pixels, and RGB values were normalised. The dataset was randomly split by 70% for training the model, 20% for the validation set, and the remaining 10% reserved for an internal testing set of the trained model. 50 epochs of training were used, during which precision, recall and accuracy metrics were recorded. Training time took approximately 24 minutes. Internal testing using 10% of unseen data showed a final precision of 1.0, recall of 0.882, and accuracy of 0.956. Loss steadily decreased with each epoch, but only reached approximately 0.26.

Model Accuracy and Loss can be seen increasing and decreasing respectively (y-axis, percent %) as training epoch (x-axis) increases.

External Testing and Performance

To ensure the model's generalizability, it was externally validated using a comprehensive dataset from the SipakMed Database, which includes 1,638 case images and 1,618 control images. All images were run through the model at thresholds from 0 to 1, increasing by 0.1, while false positive and true positive rates were recorded. Using a Receiver Operating Characteristic (ROC) curve, the Area Under the Curve (AUC) from this external testing yielded a score of 0.799, indicating a strong increase in sensitivity and specificity, although not as performant as some models reporting 0.99 AUC. (L. Zhang, 2017. doi: 10.1109/JBHI.2017.2705583.)

Area under the receiver operating characteristic (ROC) curve indicates model sensitivity and specificity at various thresholds. Medical industry standard suggests at least an ROCAUC of 0.80, approximately matching our model's AUC of 0.799.

Conclusion and Future Work

The developed CNN model demonstrates significant power of deep learning in the field of medical diagnostics and healthcare. Although the model offered an improvement on traditional pap smears, the high loss and imperfect ROC AUC suggest room for improvement. Future work will focus on further enhancing the model's performance through advanced data augmentation techniques, exploring deeper architectures, and integrating additional clinical data to improve diagnostic accuracy and reliability.

Sample Images and Classifications

The following section contains some sample images from the internal and external datasets used, along with the results of the model's classification. Images classified on a continuous scale between 0 representing dysplastia, carcinoma, or malignancy, and 1 representing a normal cervical cell, where a threshold of 0.5 was used to distinguish diagnoses.

Positive control sample 1, from training dataset (internal).

Successfully classified as risk: 0.149

Positive control sample 2, from training dataset (internal).

Successfully classified as risk: 0.165

Positive control sample 3, from validation dataset (external).

Successfully classified as risk: 0.061

Positive control sample 4, from validation dataset (external).

Unsuccessfully classified as healthy: 0.994

Negative control sample 1, from training dataset (internal).

Successfully classified as healthy: 1.000

Negative control sample 2, from training dataset (internal).

Successfully classified as healthy: 0.997

Negative control sample 3, from validation dataset (external).

Successfully classified as healthy: 0.990

Negative control sample 4, from validation dataset (external).

Unsuccessfully classified as risk: 0.024