Skip to main navigation Skip to search Skip to main content

An extensive study of the Bag-of-Words approach for gender identification of Arabic articles

  • Jordan University of Science and Technology
  • Amman Arab University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

37 Scopus citations

Abstract

The prevalent use of Online Social Networks (OSN) and the anonymity and lack of accountability they inherent from being online give rise to many problems related to finding the connection between the massive amount of text data on OSN and the people who actually wrote them. Analyzing text data for such purposes is called authorship analysis. This work is focused on one specific type of authorship analysis, which is identifying the author's gender. Gender identification has various applications from marketing to security. The focus of this work is on Arabic articles. The problem is basically a classification problem and the current approaches differ in the way they compute the features of each document. However, they all agree on following some 'stylometric features' approach. Unlike these works, ours treat this problem as a variation of the Text Classification (TC) problem and follow the Bag-Of-Words (BOW) approach for feature selection. We perform an extensive set of experiments on the feature selection and classification phase and the results show that such an approach yield surprisingly high results.

Original languageEnglish
Title of host publication2014 IEEE/ACS 11th International Conference on Computer Systems and Applications, AICCSA 2014
PublisherIEEE Computer Society
Pages601-608
Number of pages8
ISBN (Electronic)9781479971008
DOIs
StatePublished - 2014
Externally publishedYes
Event2014 11th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2014 - Doha, Qatar
Duration: 10 Nov 201413 Nov 2014

Publication series

NameProceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
Volume2014
ISSN (Print)2161-5322
ISSN (Electronic)2161-5330

Conference

Conference2014 11th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2014
Country/TerritoryQatar
CityDoha
Period10/11/1413/11/14

Fingerprint

Dive into the research topics of 'An extensive study of the Bag-of-Words approach for gender identification of Arabic articles'. Together they form a unique fingerprint.

Cite this