Abstract:
High-dimensional linear regression (HDLR) is a critical tool in fields such as finance, statistics, and machine learning, particularly for feature selection and predictive modeling when there are many predictor variables but few data points. However, the ”curse of dimensionality” presents significant challenges in fitting HDLR models, as the increasing number of predictors results in exponentially higher computational costs and a heightened risk of overfitting, where the model learns noise instead of the underlying data structure. To address these issues, we propose an alternative method that combines LASSO with Gradient Descent. This approach effectively shrinks the coefficients of less important variables while retaining the most essential ones, thus improving model generalizability and mitigating overfitting. Additionally, our method enhances computational efficiency and reduces prediction errors. We evaluate the algorithm’s performance using real datasets, such as the prostate dataset (9 variables) and the UScrime dataset (16 variables), chosen for their relevance in high-dimensional settings and their ability to illustrate challenges and effectiveness of our approach. With 70% of the data for training and 30% for testing. The proposed algorithm is compared against traditional methods, including ridge regression, LASSO regression, and gradient descent, as well as other combination algorithms. Using evaluation metrics such as Mean Squared Error (MSE) and Root Mean Squared Error (RMSE), our results show that the proposed algorithm outperforms traditional methods and standard regression techniques, demonstrating its effectiveness in high-dimensional settings where both computational feasibility and accuracy are crucial