Skip to main navigation Skip to search Skip to main content

Building a standard dataset for Arabie sentiment analysis: Identifying potential annotation pitfalls

  • Mohammed N. Al-Kabi
  • , Areej A. Al-Qwaqenah
  • , Amal H. Gigieh
  • , Kholoud Alsmearat
  • , Mahmoud Al-Ayyoub
  • , Izzat M. Alsmadi
  • Al Buraimi University College
  • Al-Balqa Applied University
  • Jordan University of Science and Technology
  • Texas A&M University-San Antonio

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Sentiment Analysis (SA) is one of the hottest research fields nowadays. It is concerned with identifying the sentiment conveyed in a piece of text. The current efforts in SA require the existence of standard datasets for training/testing purposes. Such datasets already exist for some languages such as English. Unfortunately, the same cannot be said about other languages such as Arabic. Currently existing Arabic SA datasets are restricted (in their domain, size, dialects covered, etc.) and/or have limited availability. Moreover, the annotation process did not receive the proper attention it deserves. Some of the existing datasets relied on the author's point of view for annotation, while others employed annotators, but did not take into account the personal variations between the annotators and how would that affect their agreement. This study presents our efforts to build a standard Arabic dataset with the above concerns in mind. The constructed dataset is intended for generic use as it contains reviews from different domains written in Modern Standard Arabic (MSA) as well as several dialects. As for the annotation process, it is given high attention by studying the inter-annotator agreements and investigating the potential factors affecting them.

Original languageEnglish
Title of host publication2016 IEEE/ACS 13th International Conference of Computer Systems and Applications, AICCSA 2016 - Proceedings
PublisherIEEE Computer Society
ISBN (Electronic)9781509043200
DOIs
StatePublished - 2 Jul 2016
Externally publishedYes
Event13th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2016 - Agadir, Morocco
Duration: 29 Nov 20162 Dec 2016

Publication series

NameProceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
Volume0
ISSN (Print)2161-5322
ISSN (Electronic)2161-5330

Conference

Conference13th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2016
Country/TerritoryMorocco
CityAgadir
Period29/11/162/12/16

Keywords

  • Arabic sentiment analysis
  • Cohen's Kappa measure
  • Dataset preparation
  • Inter-annotator agreement

Fingerprint

Dive into the research topics of 'Building a standard dataset for Arabie sentiment analysis: Identifying potential annotation pitfalls'. Together they form a unique fingerprint.

Cite this