Generative adversarial networks as a medium for sharing patient data

20.05.2022

This thesis work by Dorin Doncenco was part of PRIVASA project, a Business Finland co-innovation consortium. The goal of the project is to develop privacy-preserving artificial intelligence tools for health data in order to safely share medical images between the healthcare institutions or third party users whose focus is to develop commercial products or development innovations activities based on health data. In this thesis, a deep neural network is developed to create fully synthetic brain magnetic resonance images (MRI) from over 300 cancer patients.

Neural networks have reached the level where they can detect diseases from medical images. However, training these networks requires datasets which contain a lot of labelled data by clinical experts. It is difficult to acquire high quantities of labelled data, and there is also the problem of ensuring the privacy of patient data, which the laws try to regulate. As such, various modelling methods are being investigated to allow healthcare institutions to cooperate and share data safely. An alternative approach is generative adversarial networks (GANs).

A GAN consists of two agents trying to compete against each other: a generator and a discriminator. The discriminator wants to be able to distinguish between real and synthetic images, while the generator attempts to create data that convinces the discriminator is real. As the two agents learn from each other, they gradually become better, and eventually the generator will be able to generate quality data that is indistinguishable from the real one to the discriminator.

The GAN implemented in this thesis creates patients who have data that does not belong to any other patient in the dataset. This is done by merging the brain MRI information, in our case cancerous tumors, from one patient to the brain shape of another patient, creating a completely inexistent subject. The network was trained on the Brain Tumor Segmentation (BraTS) Challenge 2020 training dataset.

Thanks to the mentioned concept, the GAN creates subjects whose personal data are not related to any existing patients, thus preserving the privacy of the real patients. This would allow hospitals to artificially increase the size of their labelled datasets to improve their algorithms, and allow them to share their synthetic data with other hospitals for research purposes.

The images generated have visible tumors that can be differentiated from the healthy tissue. The finer texture of the images is lacking – you can view some generated samples and some real images below.

Real samples
Synthetic samples

The 3D GAN model, which benefits from 3D convolutional layers, has learnt how to generate MRI images which have the structures of no other subjects in the dataset. However, along with visual inspections, complementary results for brain lesion segmentation based on the synthetic MRI data obtained in this thesis work did not show very promising results, although they visually look very realistic. One possible reason is that the 3D MRI data contain much more complex information in the deeper frequency spectrum that the 3D GAN has not been able to retain. Another possible reason could be that the lesion segmentation model, DeepMedic algorithm, has not been fully optimized in order to deal with synthetic data. Both require more rigorous investigations. The PRIVASA team continues to create more objective quality assessment metrics for the synthetic data including peak-signal-to-noise ratio (PSNR), structural similarity index (SSIM), mean squared error (MSE) and several others.

GANs have the potential to be used in the healthcare industry for data synthesis, however more research has to be done before concluding on their usefulness to the industry. This leaves the door open for improvements to be made to the project. There are several interesting future directions related to this topic. A continuation could be improving the current pipeline implemented in the thesis. For example, implementing the Fréchet distance of a third-party model to measure the quality of the images and adding a local discriminator which would focus on a smaller patch of the images, thus increasing the quality of the finer features of the brain tissues and tumors. More research can be also done into the deep learning network designs, which might discover a different architecture to improve the GANs. Additionally, developing a privacy metric to objectively verify if the privacy of the patients is preserved would confirm whether the approach above is compliant with privacy laws.

Doncenco, Dorin (2022): Exploring Medical Image Data Augmentation and Synthesis using conditional Generative Adversarial Networks. Bachelor’s thesis Information and Communications Technology, Turku University of Applied Sciences.