TY - GEN
T1 - Cross-lingual short-text document classification for facebook comments
AU - Faqeeh, Mosab
AU - Abdulla, Nawaf
AU - Al-Ayyoub, Mahmoud
AU - Jararweh, Yaser
AU - Quwaider, Muhannad
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/12/12
Y1 - 2014/12/12
N2 - Document Classification (DC) is one of the fundamental problems in text mining. Plenty of works exist on DC with interesting approaches and excellent results, however, most of them focus on a long-text documents written in a single language with English being the most studied language. This work is concerned with the natural step beyond such works which is cross-lingual DC for short-text documents. Specifically, we consider two languages, Arabic and English, and compare the performance of some of the most popular document classifiers on two datasets of short Facebook comments. Apart from limited attempts, the addressed problem has not been studied well enough. The results are encouraging and new insights are obtained.
AB - Document Classification (DC) is one of the fundamental problems in text mining. Plenty of works exist on DC with interesting approaches and excellent results, however, most of them focus on a long-text documents written in a single language with English being the most studied language. This work is concerned with the natural step beyond such works which is cross-lingual DC for short-text documents. Specifically, we consider two languages, Arabic and English, and compare the performance of some of the most popular document classifiers on two datasets of short Facebook comments. Apart from limited attempts, the addressed problem has not been studied well enough. The results are encouraging and new insights are obtained.
KW - cross-lingual text analysis
KW - decision tree
KW - document classification
KW - k-nearest neighbor
KW - naive Bayes
KW - social network comments
KW - support vector machine
UR - https://www.scopus.com/pages/publications/84922510891
U2 - 10.1109/FiCloud.2014.99
DO - 10.1109/FiCloud.2014.99
M3 - Conference contribution
AN - SCOPUS:84922510891
T3 - Proceedings - 2014 International Conference on Future Internet of Things and Cloud, FiCloud 2014
SP - 573
EP - 578
BT - Proceedings - 2014 International Conference on Future Internet of Things and Cloud, FiCloud 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International Conference on Future Internet of Things and Cloud, FiCloud 2014
Y2 - 27 August 2014 through 29 August 2014
ER -