Efficient and Interpretable Machine Learning for Encrypted  Malicious Traffic Classification

Chamuditha, P.G.L.; Madhuwantha, P.L.G.H.K; Senarathna, P.K.C.; Mayuran, P.; Senthooran, V.

Efficient and Interpretable Machine Learning for Encrypted Malicious Traffic Classification

Chamuditha, P.G.L.; Madhuwantha, P.L.G.H.K; Senarathna, P.K.C.; Mayuran, P.; Senthooran, V.

URI: http://drr.vau.ac.lk/handle/123456789/2035

Date: 2026

Abstract:

Communications that are encrypted, such as HTTPS, TLS, and VPN, have become popular tools for ensuring privacy; yet, they can be used for hiding malicious payloads, making intrusion detection more challenging. This study proposes a machine learning framework for the classification of malicious encrypted communications using flow-based and temporal characteristics. Public datasets containing network traffic captures were used for testing and validating the framework. The benign and malicious flows were converted to flow-based features using Scapy and CICFlowMeter tools. Feature importance was used to select the most important features for the framework. Three machine learning models were trained and tested using the datasets: Random Forest, XGBoost, and linear Support Vector Machine (SVM). Stratified train/test split, cross-validation, and family disjoint were used for testing and validating the models. The Random Forest model was found to have achieved nearly perfect accuracy for both training and testing sets, approximately 100%, and a high accuracy of approximately 92% using cross-validation. Overfitting was minimal for the Random Forest model, whereas XGBoost was found to have overfitting issues and SVM had moderate accuracy, approximately 72%. This study suggests that the proposed framework can be used for reliably detecting malicious encrypted communications, including those that were not used in the training process. SHAP was used to analyze the explainability of the framework and identify the most important flow characteristics that were responsible for the decision-making process. The proposed framework is computationally efficient and was tested using real-world datasets, making it suitable for practical applications in network security

Show full item record