SPAM EMAIL CLASSIFICATION AND IDENTIFICATION

Keerththana, K.; Charles, E.Y.A.

dc.contributor.author	Keerththana, K.
dc.contributor.author	Charles, E.Y.A.
dc.date.accessioned	2024-11-21T07:54:58Z
dc.date.available	2024-11-21T07:54:58Z
dc.date.issued	2023-10-25
dc.identifier.uri	http://drr.vau.ac.lk/handle/123456789/1038
dc.description.abstract	Email has become one of the most wide spread ways of communication in today’s society. Email spam, commonly known as junk email, spam mail, or simply spam, refers to unsolicited messages sent in large quantities through email. Even though some spam emails contain valuable information, quite often spam emails are unwanted and lead to online fraud. Hence it is necessary to filter spam emails from regular emails. An improved spam classification approach will make users’ inboxes free from spam emails while not missing any potential emails. In this research work we analyzed the classification of emails into spam and legitimate emails using the contents of the email. This work further explored the classification of the spam emails based on categories such as promotion, marketing, news, security and others. This work analyzed the applicability of the word embedding approach for spam classification. Two different kaggle datasets (sms-spam-collection-dataset, spam filter) were used in this research work. This work considered a word embedding approach for text representation and multiple classifiers (LSTM, SVM). Since there are no publicly available multiclass spam classification data sets, an incremental approach is proposed to build the classifier. Both datasets were manually categorized and used to build the multiclass classifier. This work identified the Word2Vec model with SVM classifier obtained highest accuracy of 0.86, 0.87 for both datasets. As future work, this initial classifier will be used to classify the Enron spam email dataset. With a manual analysis the results will be verified and will be used to fine tune the classifier in multiple epochs	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Applied Science, University of Vavuniya	en_US
dc.subject	Machine learning	en_US
dc.subject	Multiclass classification	en_US
dc.subject	Spam classification	en_US
dc.subject	Spam filter	en_US
dc.subject	Spamidentification	en_US
dc.subject	Word embedding	en_US
dc.title	SPAM EMAIL CLASSIFICATION AND IDENTIFICATION	en_US
dc.type	Conference abstract	en_US
dc.identifier.proceedings	The 4th Faculty Annual Research Session - "Exploring Scientific Innovations for Global Well-being"	en_US