Abstract:
Cardiovascular disease (CVD) remains a leading cause of morbidity and mortality worldwide, underscoring the urgency of its early detection and treatment. According to the latest WHO figures, coronary heart disease accounted for 26,304 deaths in Sri Lanka in 2020, representing 22.66% of all fatalities. Early detection and identification of heart disease are crucial for preventing its progression and improving patient outcomes. This need is particularly acute in developing countries like Sri Lanka, where the high costs of diagnosis and treatment present significant barriers to effective care. In response to this pressing issue, we developed a machine learning-based system to classify the likelihood of heart disease at an early stage using newly collected Sri Lankan data. The dataset includes medical data from heart disease patients and normal subjects, comprising 13 features. Six classification methods were applied to this dataset: decision tree, random forest, support vector machine, K-nearest neighbor, logistic regression, and Gaussian naive Bayes. Additionally, four feature selection techniques were employed: forward feature selection, backward feature elimination, exhaustive feature selection, and recursive feature elimination. The performance of these feature selection approaches and machine learning techniques was assessed using various evaluation measures, including precision, accuracy, F1-score, and recall. Our findings revealed that the logistic regression classifier, combined with the feature subset selected by the recursive feature elimination method, achieved the highest classification accuracy of 96%. This system aims to facilitate the early identification of heart disease, thereby enhancing coronary health and quality of life by enabling timely intervention and management of risk factors.