TY - GEN
T1 - Team JUST at the madar shared task on arabic fine-grained dialect identification
AU - Talafha, Bashar
AU - Fadel, Ali
AU - Al-Ayyoub, Mahmoud
AU - Jararweh, Yaser
AU - AL-Smadi, Mohammad
AU - Juola, Patrick
N1 - Publisher Copyright:
© ACL 2019.All right reserved.
PY - 2019
Y1 - 2019
N2 - In this paper, we describe our team's effort on the MADAR Shared Task on Arabic Fine-Grained Dialect Identification. The task requires building a system capable of differentiating between 25 different Arabic dialects in addition to MSA. Our approach is simple. After preprocessing the data, we use Data Augmentation (DA) to enlarge the training data six times. We then build a language model and extract n-gram word-level and character-level TFIDF features and feed them into an MNB classifier. Despite its simplicity, the resulting model performs really well producing the 4th highest F-measure and region-level accuracy and the 5th highest precision, recall, city-level accuracy and country-level accuracy among the participating teams.
AB - In this paper, we describe our team's effort on the MADAR Shared Task on Arabic Fine-Grained Dialect Identification. The task requires building a system capable of differentiating between 25 different Arabic dialects in addition to MSA. Our approach is simple. After preprocessing the data, we use Data Augmentation (DA) to enlarge the training data six times. We then build a language model and extract n-gram word-level and character-level TFIDF features and feed them into an MNB classifier. Despite its simplicity, the resulting model performs really well producing the 4th highest F-measure and region-level accuracy and the 5th highest precision, recall, city-level accuracy and country-level accuracy among the participating teams.
UR - https://www.scopus.com/pages/publications/85095538726
M3 - Conference contribution
AN - SCOPUS:85095538726
T3 - ACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop
SP - 285
EP - 289
BT - ACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 4th Arabic Natural Language Processing Workshop, WANLP 2019, held at ACL 2019
Y2 - 1 August 2019
ER -