Skip to main navigation Skip to search Skip to main content

Question to question similarity analysis using morphological, syntactic, semantic, and lexical features

  • Muntaha Al-Asa'd
  • , Nour Al-Khdour
  • , Mutaz Bni Younes
  • , Enas Khwaileh
  • , Mahmoud Hammad
  • , Mohammad Al-Smadi
  • Jordan University of Science and Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

In the digitally connected world that we are living in, people expect to get answers to their questions spontaneously. This fact increased the burden on the Question/Answer platforms such as Stack Overflow and many others. A promising solution to this problem is to detect if a question being asked similar to a question in the database and present the answer of the detected question to the user. To address this challenge, we propose a novel Natural Language Processing (NLP) approach that detects if two Arabic questions are similar or not using their extracted morphological, syntactic, semantic, lexical features. Our approach involves several phases including Arabic text processing, novel feature extraction, and text classifications. To conduct our experiments, we used a real-world questions dataset consisting of 4,000 pairs of Arabic questions in which our approach achieved 78.2% accuracy using XGBoost model on the best features selected by the Random Forest feature selection technique. This high accuracy shows the ability of our approach to correctly detect the similarity between two Arabic questions.

Original languageEnglish
Title of host publication16th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2019
PublisherIEEE Computer Society
ISBN (Electronic)9781728150529
DOIs
StatePublished - Nov 2019
Externally publishedYes
Event16th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2019 - Abu Dhabi, United Arab Emirates
Duration: 3 Nov 20197 Nov 2019

Publication series

NameProceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
Volume2019-November
ISSN (Print)2161-5322
ISSN (Electronic)2161-5330

Conference

Conference16th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2019
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period3/11/197/11/19

Keywords

  • Arabic Language
  • Lexical Features
  • ML
  • NLP
  • Random Forest
  • SVM
  • Semantic Text Similarity (STS)
  • Text Classification
  • XGBoost

Fingerprint

Dive into the research topics of 'Question to question similarity analysis using morphological, syntactic, semantic, and lexical features'. Together they form a unique fingerprint.

Cite this