Comparative Study of Machine Learning Models for Non-Communicable Disease Prediction

Show simple item record

dc.contributor.author Mahendran, L.
dc.contributor.author Kuhaneshwaran, B.
dc.contributor.author Rasathurai, V.
dc.contributor.author Karunaharan, L.
dc.contributor.author Abishethvarman, V.
dc.contributor.author Prasanth, S.
dc.contributor.author Kumara, B.T.G.S.
dc.date.accessioned 2026-03-07T07:57:39Z
dc.date.available 2026-03-07T07:57:39Z
dc.date.issued 2025
dc.identifier.uri http://drr.vau.ac.lk/handle/123456789/1952
dc.description.abstract Cardiovascular and other non-communicable diseases (NCDs), especially diabetes and cancers, are some of the critical global health problems that are resulting in a high rate of morbidity and mortality. The prevalence of these conditions has been rising steadily, placing an increasing burden on healthcare systems and highlighting the need for early detection and effective preventive strategies. Early identification of individuals at risk of NCDs is critical for implementing timely interventions that can reduce disease progression and improve population health outcomes. In this context, machine learning (ML) techniques offer a promising approach for modeling disease risk by leveraging large-scale health and anthropometric data to uncover patterns not easily detected through traditional statistical methods. The paper proposes a novel machine learning (ML) scheme to model NCD risks based on anthropometric data on 300 participants in the Jaffna Teaching Hospital and Sabaragamuwa University of Sri Lanka. Important characteristics like age, gender, height, weight, body mass index (BMI), and visceral fat area were derived in order to determine risk factors for the disease. The data analysis methodology is based on strong data preprocessing, removing noise and normalisation using min-max scaling, and the correction of outliers using Python libraries such as pandas. Classification makes use of supervised ML algorithms, namely, Random Forest, Extreme Gradient Boosting, Artificial Neural Network, Decision Tree, AdaBoost, Logistic Regression, CatBoost, and Support Vector Machine. The data is divided into 80 percent training set and a 20 percent testing set, which are optimised by grid search cross-validation to provide strong model parameters. The strategy is an effective way to improve the early identification of NCDs, which allows providers to have a flexible, data-intensive resource to provide high-quality and timely interventions, leading to the overall improvement of preventive care and population health in resource-limited contexts. Overall, this study demonstrates the effectiveness of ML techniques for early identification of individuals at risk of NCDs, providing healthcare providers with a data-driven tool to deliver timely interventions. By facilitating preventive care and optimizing resource allocation, the proposed framework has the potential to improve population health outcomes of ML models for NCD risk prediction. en_US
dc.language.iso en en_US
dc.publisher Faculty of Applied Science University of Vavuniya Sri Lanka en_US
dc.subject Anthropometric data en_US
dc.subject Machine learning en_US
dc.subject Non-communicable diseases en_US
dc.subject Predictive modeling en_US
dc.subject Risk prediction en_US
dc.title Comparative Study of Machine Learning Models for Non-Communicable Disease Prediction en_US
dc.type Conference abstract en_US
dc.identifier.proceedings 1st International Conference on Applied Sciences- 2025 en_US


Files in this item

This item appears in the following Collection(s)

  • ICAS - 2025 [59]
    International Conference on Applied Sciences - 2025

Show simple item record

Search


Browse

My Account