Assessing GAN and VAE Augmentation Methods in Malignant Pleural Mesothelioma Prediction

Fathima Azka, M.A.

dc.contributor.author	Fathima Azka, M.A.
dc.date.accessioned	2026-03-07T09:02:32Z
dc.date.available	2026-03-07T09:02:32Z
dc.date.issued	2025
dc.identifier.uri	http://drr.vau.ac.lk/handle/123456789/1970
dc.description.abstract	Malignant Pleural Mesothelioma (MPM) is a rare and aggressive cancer that is strongly associ ated with asbestos exposure. Its severity has led to growing research interest in finding effective solutions. In recent years, computational methods and machine learning approaches have been increasingly applied in oncology to classify tumor and normal samples using transcriptomic data. However, such models typi cally require large and balanced datasets to achieve robust performances, which are not available for rare cancers like MPM due to the very limited number of patients and under-representation of normal samples. This data scarcity poses a significant challenge in building predictive models that are reliable and general izable. To address this limitation, we employ computational analysis with data augmentation as a strategy to increase the effective sample size. Specifically, we evaluate two deep generative models, Generative Ad versarial Networks (GANs) and Variational Autoencoders (VAEs) to generate synthetic tumor and normal samples. Importantly, synthetic samples were used strictly in the training process, while test sets contained only real data, ensuring no data leakage during evaluation. To validate the augmentation strategy, a com parative evaluation framework was introduced using both the naturally imbalanced MPM dataset and an originally balanced breast cancer dataset, which is further manipulated to simulate imbalance, resulting in four experimental conditions: original balanced data, artificially imbalanced data, GAN-augmented data, and VAE-augmented data. Classification is performed using Support Vector Machines (SVM) and Random Forests (RF), and model performance is assessed through accuracy, F1 score, precision, recall, and ROC AUC. In addition, Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) are applied to visually examine the quality and separability of synthetic data. The results show that GAN-based augmentation consistently improves classification performance more than VAE-based augmenta tion, particularly under imbalanced conditions. For instance, in the imbalanced breast cancer setting, GAN improved SVM accuracy by 5.6% and recall by 7.1% compared to the baseline without augmentation. In MPM, performance gains were smaller due to high baseline separability, indicating a ceiling effect. Overall, GAN achieved a mean performance score of 0.9247, compared to 0.9081 for VAE. This study presents a re producible computational pipeline for benchmarking generative models in transcriptomics, and demonstrates that augmentation can effectively mitigate class imbalance in cancer prediction, while highlights the impor tance of dataset specific characteristics. The findings also motivate further research into hybrid generative architectures and biologically grounded validation strategies in precision oncology.	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Applied Science University of Vavuniya Sri Lanka	en_US
dc.subject	Breast Cancer	en_US
dc.subject	Generative adversarial networks	en_US
dc.subject	Malignant Pleural Mesothelioma	en_US
dc.subject	Random forests	en_US
dc.subject	support vector machines	en_US
dc.subject	Variational autoencoders	en_US
dc.title	Assessing GAN and VAE Augmentation Methods in Malignant Pleural Mesothelioma Prediction	en_US
dc.type	Conference abstract	en_US
dc.identifier.proceedings	1st International Conference on Applied Sciences- 2025	en_US