Abstract
Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this issue by introducing a new algorithm for extracting Arabic words' roots. The proposed algorithm, which is called the Word Substring Stemming Algorithm, does not remove affixes during the extraction process. Rather, it is based on producing the set of all substrings of an Arabic word, and uses the Arabic roots file, the Arabic patterns file and a concrete set of rules to extract correct roots from substrings. The experiments have shown that the proposed approach is competitive and its accuracy is 83.9%, Furthermore, its accuracy can be enhanced more in the sense that, for about 9.9% of the tested words, the WSS algorithm retrieves two candidates (in most cases) for the correct root.
| Original language | English |
|---|---|
| Pages (from-to) | 376-385 |
| Number of pages | 10 |
| Journal | Journal of Information Science |
| Volume | 40 |
| Issue number | 3 |
| DOIs | |
| State | Published - Jun 2014 |
| Externally published | Yes |
Keywords
- Arabic roots
- Information retrieval
- Search engines
- Stemming
Fingerprint
Dive into the research topics of 'Extracting the roots of Arabic words without removing affixes'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver