Abstract:
Short messaging service (SMS) using cell phone is one of the simplest and most convenient modalities in communications that can be used by the users. However, SMS messages via cell phones have become the biggest target by spammer to send short text messages spam. Efficient and accurate recognition of SMS messages as either spam or ham from a set of messages is one of the foremost challenges for cell phone users. SMS spam filter is hard to control and spam can easily spread via network because cell phone network operator cannot control what customers agreed to receive and accept. The aim of the current study was to extract and classify the messages usingthree different classifications have been used individually (Naive Bayes, Artificial Neural Network and K-Nearest Neighbors).
About 5,574 messages have been received from the SMS dataset to be used in this study. Naïve Bayes, Artificial Neural Network (Multi-Layer Perceptron) and K-Nearest Neighbors were used to classify the messages as spam or ham. Two feature selection methods have been used to calculate the parameters; Accuracy, Recall, F-measure, Precision and Time. Feature selection methods using Informational Gain and Informational Gain Ratio, was performed. The results were compared with the results recorded using the standard method of classification (non-feature selection).
The results of the study showed that higher Accuracy (98.86& 98.93) and Time(8.98 &8.05seconds) values wererecorded by Multi-Layer Perceptron compared to the other two classifiers when using Information Gain and Information Gain Ratio methods.Naïve Bayes was the fastest in time (0.02 seconds) in achieving the filtering process than the other two classifiers.
It can be concluded that best Accuracy and Time values were recorded by Multi-Layer Perceptron with both feature selection methods. The best Time value was recorded withNaïve Bayes with limited improvement in the Accuracy. K-Nearest Neighbors recorded no beneficial difference in Accuracy with and without the feature selection methods with a very little improvement in time.