THAM HUI MING
My name is Tham Hui Ming, and I am currently pursuing a Bachelor’s degree in USM, Computer Science. I am majoring in Software Engineering and minoring in Management. I have learned object-oriented programming and some programming languages such as Java, Python, C++, and web programming languages. Besides, I have attended the Scaled Agile Framework (SAFe) training and passed the certification exam. My FYP project is Sentiment Classification of English-Malay Code-Switching Opinionated Text on Social Media. I am interested in quality analyst or business analyst role.
Matrix No:
137172
Student Email:
Supervisor:
Dr. Gan Keng Hoon
Supervisor Email:
Sentiment Classification of English-Malay Code-Switching Opinionated Text on Social Media
RP011
Malaysian Twitter users often rely on tweets for making purchasing decisions. As Malaysia is a multicultural society, Malaysians often practice English and Malay code-switching when expressing thoughts on social media texts. Manually reading and understanding mountains of code-switching tweets is very burdensome for both sellers and customers. Thus, automated sentiment classification comes in to help customers in deciding quickly besides expanding Malaysian local brand businesses. The challenge of this research is to improve the accuracy of sentiment classification of English-Malay code-switching tweets by proposing the normalization of informal Malay word approaches to address the existence of non-standardized Malay abbreviations and slang words that lead to misclassification of the sentiment of the whole tweets. Two different techniques used in informal Malay word normalization are (i) normalizing the informal Malay words using the existing Malaya NLTK library and (ii) normalizing the informal Malay words using both Malaya NLTK library and constructed lexicons. One lexicon is built from our training dataset and another two lexicons are constructed using informal collections. Comparison of a different combination of lexicons with the Malaya NLTK library is performed to seek the best informal Malay word normalization method. Besides, due to the current limitations of unsupervised lexicon and rule-based system in classifying the English-Malay code-switching tweets, the machine learning approach is used to learn and effectively classifying the sentiment of tweets. Supervised machine learning classifiers used and compared in this study are (a) Naïve Bayes, (b) Support Vector Machine, (c) Random Forest, (d) Logistic Regression, and (e) ensemble of every combination of the above individual classifiers using a voting algorithm. Besides, different feature extraction techniques such as (A) TF-IDF, (B) Count Vector, (C) Boolean Vector, (D) Chi-squared, and hyperparameter tuning are performed to enhance the performance of the learning models.