Skip to main navigation Skip to search Skip to main content

Team JUST at the madar shared task on arabic fine-grained dialect identification

  • Bashar Talafha
  • , Ali Fadel
  • , Mahmoud Al-Ayyoub
  • , Yaser Jararweh
  • , Mohammad AL-Smadi
  • , Patrick Juola
  • Jordan University of Science and Technology
  • Duquesne University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

In this paper, we describe our team's effort on the MADAR Shared Task on Arabic Fine-Grained Dialect Identification. The task requires building a system capable of differentiating between 25 different Arabic dialects in addition to MSA. Our approach is simple. After preprocessing the data, we use Data Augmentation (DA) to enlarge the training data six times. We then build a language model and extract n-gram word-level and character-level TFIDF features and feed them into an MNB classifier. Despite its simplicity, the resulting model performs really well producing the 4th highest F-measure and region-level accuracy and the 5th highest precision, recall, city-level accuracy and country-level accuracy among the participating teams.

Original languageEnglish
Title of host publicationACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages285-289
Number of pages5
ISBN (Electronic)9781950737321
StatePublished - 2019
Externally publishedYes
Event4th Arabic Natural Language Processing Workshop, WANLP 2019, held at ACL 2019 - Florence, Italy
Duration: 1 Aug 2019 → …

Publication series

NameACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop

Conference

Conference4th Arabic Natural Language Processing Workshop, WANLP 2019, held at ACL 2019
Country/TerritoryItaly
CityFlorence
Period1/08/19 → …

Fingerprint

Dive into the research topics of 'Team JUST at the madar shared task on arabic fine-grained dialect identification'. Together they form a unique fingerprint.

Cite this