Comparative Study of Machine Learning Models for Non-Communicable Disease Prediction

Mahendran, L.; Kuhaneshwaran, B.; Rasathurai, V.; Karunaharan, L.; Abishethvarman, V.; Prasanth, S.; Kumara, B.T.G.S.

dc.contributor.author	Mahendran, L.
dc.contributor.author	Kuhaneshwaran, B.
dc.contributor.author	Rasathurai, V.
dc.contributor.author	Karunaharan, L.
dc.contributor.author	Abishethvarman, V.
dc.contributor.author	Prasanth, S.
dc.contributor.author	Kumara, B.T.G.S.
dc.date.accessioned	2026-03-07T07:57:39Z
dc.date.available	2026-03-07T07:57:39Z
dc.date.issued	2025
dc.identifier.uri	http://drr.vau.ac.lk/handle/123456789/1952
dc.description.abstract	Cardiovascular and other non-communicable diseases (NCDs), especially diabetes and cancers, are some of the critical global health problems that are resulting in a high rate of morbidity and mortality. The prevalence of these conditions has been rising steadily, placing an increasing burden on healthcare systems and highlighting the need for early detection and effective preventive strategies. Early identification of individuals at risk of NCDs is critical for implementing timely interventions that can reduce disease progression and improve population health outcomes. In this context, machine learning (ML) techniques offer a promising approach for modeling disease risk by leveraging large-scale health and anthropometric data to uncover patterns not easily detected through traditional statistical methods. The paper proposes a novel machine learning (ML) scheme to model NCD risks based on anthropometric data on 300 participants in the Jaffna Teaching Hospital and Sabaragamuwa University of Sri Lanka. Important characteristics like age, gender, height, weight, body mass index (BMI), and visceral fat area were derived in order to determine risk factors for the disease. The data analysis methodology is based on strong data preprocessing, removing noise and normalisation using min-max scaling, and the correction of outliers using Python libraries such as pandas. Classification makes use of supervised ML algorithms, namely, Random Forest, Extreme Gradient Boosting, Artificial Neural Network, Decision Tree, AdaBoost, Logistic Regression, CatBoost, and Support Vector Machine. The data is divided into 80 percent training set and a 20 percent testing set, which are optimised by grid search cross-validation to provide strong model parameters. The strategy is an effective way to improve the early identification of NCDs, which allows providers to have a flexible, data-intensive resource to provide high-quality and timely interventions, leading to the overall improvement of preventive care and population health in resource-limited contexts. Overall, this study demonstrates the effectiveness of ML techniques for early identification of individuals at risk of NCDs, providing healthcare providers with a data-driven tool to deliver timely interventions. By facilitating preventive care and optimizing resource allocation, the proposed framework has the potential to improve population health outcomes of ML models for NCD risk prediction.	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Applied Science University of Vavuniya Sri Lanka	en_US
dc.subject	Anthropometric data	en_US
dc.subject	Machine learning	en_US
dc.subject	Non-communicable diseases	en_US
dc.subject	Predictive modeling	en_US
dc.subject	Risk prediction	en_US
dc.title	Comparative Study of Machine Learning Models for Non-Communicable Disease Prediction	en_US
dc.type	Conference abstract	en_US
dc.identifier.proceedings	1st International Conference on Applied Sciences- 2025	en_US