Skip to main navigation Skip to search Skip to main content

Cross-lingual short-text document classification for facebook comments

  • Jordan University of Science and Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

46 Scopus citations

Abstract

Document Classification (DC) is one of the fundamental problems in text mining. Plenty of works exist on DC with interesting approaches and excellent results, however, most of them focus on a long-text documents written in a single language with English being the most studied language. This work is concerned with the natural step beyond such works which is cross-lingual DC for short-text documents. Specifically, we consider two languages, Arabic and English, and compare the performance of some of the most popular document classifiers on two datasets of short Facebook comments. Apart from limited attempts, the addressed problem has not been studied well enough. The results are encouraging and new insights are obtained.

Original languageEnglish
Title of host publicationProceedings - 2014 International Conference on Future Internet of Things and Cloud, FiCloud 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages573-578
Number of pages6
ISBN (Electronic)9781479943586
DOIs
StatePublished - 12 Dec 2014
Externally publishedYes
Event2nd International Conference on Future Internet of Things and Cloud, FiCloud 2014 - Barcelona, Spain
Duration: 27 Aug 201429 Aug 2014

Publication series

NameProceedings - 2014 International Conference on Future Internet of Things and Cloud, FiCloud 2014

Conference

Conference2nd International Conference on Future Internet of Things and Cloud, FiCloud 2014
Country/TerritorySpain
CityBarcelona
Period27/08/1429/08/14

Keywords

  • cross-lingual text analysis
  • decision tree
  • document classification
  • k-nearest neighbor
  • naive Bayes
  • social network comments
  • support vector machine

Fingerprint

Dive into the research topics of 'Cross-lingual short-text document classification for facebook comments'. Together they form a unique fingerprint.

Cite this