Skip to main navigation Skip to search Skip to main content

Using a hierarchical softmax based on the huffman coding tree for authenticating arabic tweets

  • Jordan University of Science and Technology
  • Duquesne University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Attributing a piece of text to its true author is called Authorship Authentication (AA). This work addresses the AA problem of Arabic tweets. Arabic language is both challenging and understudied. Existing approaches on authenticating Arabic tweets used bag of words features or Stylometric Features coupled with classifiers like SVM. However, the reported accuracy for these approaches is rather low and did not even reach 69%. In this work, we address this problem using two approaches. (a) A baseline approach that uses SVM along with BoW features, and (b) a character-level linear classifier (char-LC) with a rank constraint and a fast loss approximation along with word embeddings based on fasttext. Both approaches give significantly higher accuracies than the results reported in literature with 78.28% for the SVM along with BoW approach and 79.4% for the char-LC.

Original languageEnglish
Title of host publication16th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2019
PublisherIEEE Computer Society
ISBN (Electronic)9781728150529
DOIs
StatePublished - Nov 2019
Externally publishedYes
Event16th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2019 - Abu Dhabi, United Arab Emirates
Duration: 3 Nov 20197 Nov 2019

Publication series

NameProceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
Volume2019-November
ISSN (Print)2161-5322
ISSN (Electronic)2161-5330

Conference

Conference16th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2019
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period3/11/197/11/19

Keywords

  • Authorship authentication of Arabic tweets
  • Character-level linear classifier
  • Fasttext embeddings

Fingerprint

Dive into the research topics of 'Using a hierarchical softmax based on the huffman coding tree for authenticating arabic tweets'. Together they form a unique fingerprint.

Cite this