The X-ray dataset is taken from - https://github.com/ieee8023/covid-chestxray-dataset.

I have been running this code on Google Colab environment using a GPU as a hardware accelerator. Since the notebook is using Resnet50 for transfer learning you will need a machine with an NVIDIA GPU otherwise the training might take ages

The folder structure I am using here to store the images is -
Covid/
----| covid_positive
----| covid_negative
----| metadata.csv (This can be downloaded from the open source database)

Fast AI makes it extremely simple if you make a folder for each of your class and put all the respective images in that folder

Important: Pay attention! This notebook by no means is an accurate model to predict covid-19 or make any kind of diagnosis whatsoever. This is an ongoing experiment being done using opensource databases just as a POC. The model is not suitable for deployment.*

from fastai.vision import *
from fastai.widgets import *
import pandas as pd

Mount Google Drive with path to folders containing images. You need to change this path to point to the folder which contains the x-ray images

drive_path = 'drive/My Drive/FastAI/Covid/'

df = pd.read_csv(drive_path+'metadata.csv')

covid_positive = df['finding'] == 'COVID-19'

xrays = df['modality'] == 'X-ray'
CT = df['modality'] == 'CT'
PA = df['view'] == 'PA'
AP = df['view'] == 'AP'

I have cloned the image repository to my local machine. Since I am running this notebook on Colab I have to run a script on my local machine to upload the covid_postive and covid_negative images to my google drive.

PA_covid and PA_non_covid gives me the list of file names which are then uploaded to google drive

PA_covid = df[covid_positive & PA ]
PA_non_covid = df[PA & ~covid_positive]

#List files belonging to both the classes
covid_images = [files for files in PA_covid['filename']]
non_covid_images = [files for files in PA_non_covid['filename']]

path = Path(drive_path)

classes = ['covid_positive', 'covid_negative']

#sanity check images
for c in classes:
  verify_images(path/c)

#Split data into train and validation sets
np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", test='test', valid_pct=0.20,
        ds_tfms=get_transforms(), bs=8, size=512, num_workers=4).normalize(imagenet_stats)

#View data
data.show_batch(rows=3, figsize=(7,8))

Training

We are using a Resnet50 for transfer learning. Using fit one cycle for a few epochs and then using fastai's lrfinder to find an optimal range for our learning rate.

We use precision and recall to measure the incidents of false positives and false negatives.

precision=Precision()
recall=Recall()
AUC=AUROC()

learn = cnn_learner(data, models.resnet50, metrics=(accuracy, precision, recall, AUC))

learn.fit_one_cycle(1)

learn.save('stage-1')

Use lr_find() to choose learning rate

learn.lr_find()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

learn.recorder.plot()

learn.fit_one_cycle(5, max_lr=slice(3e-6,3e-4))
learn.recorder.plot_losses()

learn.save('stage-2')

learn.lr_find()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

learn.recorder.plot()

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

interp.plot_top_losses(4)

preds, y, losses = learn.get_preds(ds_type=DatasetType.Test, with_loss=True)

learn.recorder.plot()

learn.recorder.plot()

learn.fit_one_cycle(5, max_lr=slice(3e-6,3e-4))
learn.recorder.plot_losses()

learn.fit_one_cycle(5, max_lr=slice(3e-6,3e-4))
learn.recorder.plot_losses()

learn.save('stage-2')

learn.save('stage-2')

learn.lr_find()

learn.lr_find()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

learn.recorder.plot()

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

interp.plot_top_losses(4)

preds, y, losses = learn.get_preds(ds_type=DatasetType.Test, with_loss=True)

learn.recorder.plot()

learn.recorder.plot()

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

interp.plot_top_losses(4)

interp.plot_top_losses(4)

preds, y, losses = learn.get_preds(ds_type=DatasetType.Test, with_loss=True)

preds, y, losses = learn.get_preds(ds_type=DatasetType.Test, with_loss=True)

epoch	train_loss	valid_loss	accuracy
0	0.675613	#na#	00:12
1	0.555154	#na#	00:12
2	0.499213	#na#	00:11
3	0.750383	#na#	00:12

epoch	train_loss	valid_loss	accuracy	precision	recall	auroc	time
0	0.635228	0.381105	0.816327	0.875000	0.666667	0.945578	00:15
1	0.536443	0.349897	0.877551	0.894737	0.809524	0.945578	00:15
2	0.512136	0.344853	0.857143	0.850000	0.809524	0.942177	00:15
3	0.416277	0.350728	0.857143	0.850000	0.809524	0.945578	00:14
4	0.382743	0.340198	0.857143	0.850000	0.809524	0.942177	00:15

epoch	train_loss	valid_loss	accuracy
0	0.295858	#na#	00:11
1	0.376120	#na#	00:11
2	0.397033	#na#	00:11
3	0.516553	#na#	00:11