Generalised LARS Framework for Variable Selection in High Dimensional Binary Classification

Show simple item record

dc.contributor.author Kayathiri, T.
dc.contributor.author Kayanan, M.
dc.contributor.author Wijekoon, P.
dc.date.accessioned 2025-12-22T03:59:34Z
dc.date.available 2025-12-22T03:59:34Z
dc.date.issued 2025
dc.identifier.uri http://drr.vau.ac.lk/handle/123456789/1619
dc.description.abstract High-dimensional logistic regression refers to situations where the number of predictor variables exceeds the number of observations in binary classification. This technique is particularly valuable in domains such as genomics, biomedical imaging, social sciences, ecology and finance. Despite its advantages, high-dimensional logistic regression presents several challenges, including the risk of overfitting, instability in parameter estimation, increased computational demands, multicollinearity among predictors and complexities, in selecting the most relevant variables. To address these issues, various penalised methods have been developed, including the Least Absolute Shrinkage and Selection Operator (LASSO) and Elastic Net (ENet). The ENet method combines the strengths of both LASSO and the Logistic Ridge Estimator (LRE), providing a more flexible regularisation approach. In this study, a generalised version of the Least Angle Regression (GLARS) algorithm is proposed for variable selection, aiming at mitigating multicollinearity among predictor variables in high-dimensional logistic regression. This method combines the LARS algorithm and LASSO with existing estimators: Maximum Likelihood Estimator (MLE), Logistic Ridge Estimator (LRE), Logistic Liu Estimator (LLE), Modified Almost Unbiased Ridge Logistic Estimator (MAURLE), Modified Almost Unbiased Logistic Liu Estimator (MAULLE), Principal Component Logistic Estimator (PCLE), r-k class, and r-d class estimators. GLARS updates coefficients iteratively using least-angle directions derived from these biased estimators. Furthermore, the erformance of each biased estimator integrated within the LARS algorithm is evaluated using log-loss on both empirical and real datasets, including applications to colon tumour and diffuse large B-cell lymphoma (DLBCL) data. Findings indicate that LARS-PCLE performes the best for the given empirical dataset, LARS-r-d for the colon data, and LARS-LLE for the DLBCL dataset, with corresponding log-loss values of 0.2188, 0.8425, and 0.2949, respectively. These results highlight that the effectiveness of biased estimators within the LARS framework varies with dataset characteristics. Future work will focus on developing an R package to assist in selecting the appropriate estimator and computing coefficients for various data types. en_US
dc.language.iso en en_US
dc.publisher PGIS, University of Peradeniya en_US
dc.subject High dimension en_US
dc.subject Least angle regression en_US
dc.subject Logistic regression en_US
dc.subject Log-loss evaluation en_US
dc.subject Penalised estimators en_US
dc.title Generalised LARS Framework for Variable Selection in High Dimensional Binary Classification en_US
dc.type Conference abstract en_US
dc.identifier.proceedings Postgraduate Institute of Science Research Congress, Sri Lanka (ResCon 2025) en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Browse

My Account