SPAM EMAIL CLASSIFICATION AND IDENTIFICATION

Show simple item record

dc.contributor.author Keerththana, K.
dc.contributor.author Charles, E.Y.A.
dc.date.accessioned 2024-11-21T07:54:58Z
dc.date.available 2024-11-21T07:54:58Z
dc.date.issued 2023-10-25
dc.identifier.uri http://drr.vau.ac.lk/handle/123456789/1038
dc.description.abstract Email has become one of the most wide spread ways of communication in today’s society. Email spam, commonly known as junk email, spam mail, or simply spam, refers to unsolicited messages sent in large quantities through email. Even though some spam emails contain valuable information, quite often spam emails are unwanted and lead to online fraud. Hence it is necessary to filter spam emails from regular emails. An improved spam classification approach will make users’ inboxes free from spam emails while not missing any potential emails. In this research work we analyzed the classification of emails into spam and legitimate emails using the contents of the email. This work further explored the classification of the spam emails based on categories such as promotion, marketing, news, security and others. This work analyzed the applicability of the word embedding approach for spam classification. Two different kaggle datasets (sms-spam-collection-dataset, spam filter) were used in this research work. This work considered a word embedding approach for text representation and multiple classifiers (LSTM, SVM). Since there are no publicly available multiclass spam classification data sets, an incremental approach is proposed to build the classifier. Both datasets were manually categorized and used to build the multiclass classifier. This work identified the Word2Vec model with SVM classifier obtained highest accuracy of 0.86, 0.87 for both datasets. As future work, this initial classifier will be used to classify the Enron spam email dataset. With a manual analysis the results will be verified and will be used to fine tune the classifier in multiple epochs en_US
dc.language.iso en en_US
dc.publisher Faculty of Applied Science, University of Vavuniya en_US
dc.subject Machine learning en_US
dc.subject Multiclass classification en_US
dc.subject Spam classification en_US
dc.subject Spam filter en_US
dc.subject Spamidentification en_US
dc.subject Word embedding en_US
dc.title SPAM EMAIL CLASSIFICATION AND IDENTIFICATION en_US
dc.type Conference paper en_US
dc.identifier.proceedings The 4th Faculty Annual Research Session - "Exploring Scientific Innovations for Global Well-being" en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Browse

My Account