TY - GEN
T1 - Neural Arabic text diacritization
T2 - 6th Workshop on Asian Translation, WAT@EMNLP-IJCNLP 2019
AU - Fadel, Ali
AU - Tuffaha, Ibraheem
AU - Al-Jawarneh, Bara
AU - Al-Ayyoub, Mahmoud
N1 - Publisher Copyright:
© 2019 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several enhancements such as 100-hot encoding, embeddings, Conditional Random Field (CRF) and Block-Normalized Gradient (BNG). The models are tested on the only freely available benchmark dataset and the results show that our models are either better or on par with other models, which require language-dependent post-processing steps, unlike ours. Moreover, we show that diacritics in Arabic can be used to enhance the models of NLP tasks such as Machine Translation (MT) by proposing the Translation over Diacritization (ToD) approach.
AB - In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several enhancements such as 100-hot encoding, embeddings, Conditional Random Field (CRF) and Block-Normalized Gradient (BNG). The models are tested on the only freely available benchmark dataset and the results show that our models are either better or on par with other models, which require language-dependent post-processing steps, unlike ours. Moreover, we show that diacritics in Arabic can be used to enhance the models of NLP tasks such as Machine Translation (MT) by proposing the Translation over Diacritization (ToD) approach.
UR - https://www.scopus.com/pages/publications/85119411554
M3 - Conference contribution
AN - SCOPUS:85119411554
T3 - WAT@EMNLP-IJCNLP 2019 - 6th Workshop on Asian Translation, Proceedings
SP - 215
EP - 225
BT - WAT@EMNLP-IJCNLP 2019 - 6th Workshop on Asian Translation, Proceedings
PB - Association for Computational Linguistics (ACL)
Y2 - 4 November 2019
ER -