Skip to main navigation Skip to search Skip to main content

Extracting the roots of Arabic words without removing affixes

  • Yarmouk University
  • Jordan University of Science and Technology

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this issue by introducing a new algorithm for extracting Arabic words' roots. The proposed algorithm, which is called the Word Substring Stemming Algorithm, does not remove affixes during the extraction process. Rather, it is based on producing the set of all substrings of an Arabic word, and uses the Arabic roots file, the Arabic patterns file and a concrete set of rules to extract correct roots from substrings. The experiments have shown that the proposed approach is competitive and its accuracy is 83.9%, Furthermore, its accuracy can be enhanced more in the sense that, for about 9.9% of the tested words, the WSS algorithm retrieves two candidates (in most cases) for the correct root.

Original languageEnglish
Pages (from-to)376-385
Number of pages10
JournalJournal of Information Science
Volume40
Issue number3
DOIs
StatePublished - Jun 2014
Externally publishedYes

Keywords

  • Arabic roots
  • Information retrieval
  • Search engines
  • Stemming

Fingerprint

Dive into the research topics of 'Extracting the roots of Arabic words without removing affixes'. Together they form a unique fingerprint.

Cite this