SMS Sentiment Classification based on Lexical Features, Emoticons and Informal Abbreviations
DOI:
https://doi.org/10.55630/sjc.2019.13.81-96Keywords:
computer application in arts and humanities, web-based services, document analysisAbstract
In this paper we investigate the influence of emoticons, informal speech, lexical and other linguistic features on the sentiment contained in SMS messages. Using the dataset of ~6,000 samples, we trained a linear SVM classifier able to determine positive, negative and neutral sentiments. The dataset mostly contains messages in Serbian, but also in English and German. The classifier had an average accuracy score of 92.3% in a 5-fold Cross Validation setting, and F1-score of 92.1%, 74.0% and 93.3% in favor of positive, negative and neutral class, respectively.