Efficient and Interpretable Machine Learning for Encrypted  Malicious Traffic Classification

Chamuditha, P.G.L.; Madhuwantha, P.L.G.H.K; Senarathna, P.K.C.; Mayuran, P.; Senthooran, V.

dc.contributor.author	Chamuditha, P.G.L.
dc.contributor.author	Madhuwantha, P.L.G.H.K
dc.contributor.author	Senarathna, P.K.C.
dc.contributor.author	Mayuran, P.
dc.contributor.author	Senthooran, V.
dc.date.accessioned	2026-03-26T03:51:42Z
dc.date.available	2026-03-26T03:51:42Z
dc.date.issued	2026
dc.identifier.uri	http://drr.vau.ac.lk/handle/123456789/2035
dc.description.abstract	Communications that are encrypted, such as HTTPS, TLS, and VPN, have become popular tools for ensuring privacy; yet, they can be used for hiding malicious payloads, making intrusion detection more challenging. This study proposes a machine learning framework for the classification of malicious encrypted communications using flow-based and temporal characteristics. Public datasets containing network traffic captures were used for testing and validating the framework. The benign and malicious flows were converted to flow-based features using Scapy and CICFlowMeter tools. Feature importance was used to select the most important features for the framework. Three machine learning models were trained and tested using the datasets: Random Forest, XGBoost, and linear Support Vector Machine (SVM). Stratified train/test split, cross-validation, and family disjoint were used for testing and validating the models. The Random Forest model was found to have achieved nearly perfect accuracy for both training and testing sets, approximately 100%, and a high accuracy of approximately 92% using cross-validation. Overfitting was minimal for the Random Forest model, whereas XGBoost was found to have overfitting issues and SVM had moderate accuracy, approximately 72%. This study suggests that the proposed framework can be used for reliably detecting malicious encrypted communications, including those that were not used in the training process. SHAP was used to analyze the explainability of the framework and identify the most important flow characteristics that were responsible for the decision-making process. The proposed framework is computationally efficient and was tested using real-world datasets, making it suitable for practical applications in network security	en_US
dc.language.iso	en	en_US
dc.publisher	Korea Database Strategy Society (KDSS)	en_US
dc.subject	Encrypted network traffic	en_US
dc.subject	Malware classification	en_US
dc.subject	Feature engineering	en_US
dc.subject	Machine learning	en_US
dc.subject	Network security	en_US
dc.title	Efficient and Interpretable Machine Learning for Encrypted Malicious Traffic Classification	en_US
dc.type	Conference abstract	en_US
dc.identifier.proceedings	32nd International Conference on IT Applications and Management	en_US