Abstract:
Cardiovascular and other non-communicable diseases (NCDs), especially diabetes and cancers,
are some of the critical global health problems that are resulting in a high rate of morbidity and mortality.
The prevalence of these conditions has been rising steadily, placing an increasing burden on healthcare
systems and highlighting the need for early detection and effective preventive strategies. Early identification
of individuals at risk of NCDs is critical for implementing timely interventions that can reduce disease
progression and improve population health outcomes. In this context, machine learning (ML) techniques
offer a promising approach for modeling disease risk by leveraging large-scale health and anthropometric
data to uncover patterns not easily detected through traditional statistical methods. The paper proposes a
novel machine learning (ML) scheme to model NCD risks based on anthropometric data on 300 participants
in the Jaffna Teaching Hospital and Sabaragamuwa University of Sri Lanka. Important characteristics like
age, gender, height, weight, body mass index (BMI), and visceral fat area were derived in order to determine
risk factors for the disease. The data analysis methodology is based on strong data preprocessing, removing
noise and normalisation using min-max scaling, and the correction of outliers using Python libraries such as
pandas. Classification makes use of supervised ML algorithms, namely, Random Forest, Extreme Gradient
Boosting, Artificial Neural Network, Decision Tree, AdaBoost, Logistic Regression, CatBoost, and Support
Vector Machine. The data is divided into 80 percent training set and a 20 percent testing set, which are
optimised by grid search cross-validation to provide strong model parameters. The strategy is an effective
way to improve the early identification of NCDs, which allows providers to have a flexible, data-intensive
resource to provide high-quality and timely interventions, leading to the overall improvement of preventive
care and population health in resource-limited contexts. Overall, this study demonstrates the effectiveness
of ML techniques for early identification of individuals at risk of NCDs, providing healthcare providers with
a data-driven tool to deliver timely interventions. By facilitating preventive care and optimizing resource
allocation, the proposed framework has the potential to improve population health outcomes of ML models
for NCD risk prediction.