Abstract
Automatic document classification has become increasingly important and difficult due to the large scale of the electronic documents used in the last years. Traditional information retrieval systems are based on the extraction of keywords from documents; these keywords serve as a basis for documents classification. This paper proposes a new semantic approach for documents classification. Specifically, our approach captures, in addition to the keywords frequency, the meaning of these keywords in documents using domain ontology. The main idea is to represent documents by concepts rather than keywords, and calculates weights for these concepts to reflect their importance in the documents where they appear. The presence of concepts in the same paragraph, section, document, or document set, provides important information to better extract and understand the semantic content of the document and therefore improves its classification. The experimental evaluation is carried out using the Reuters document collection RCV1-v2 and the GALEN medical ontology. The documents are classified using the SVM classifier. The experimental results demonstrate that the proposed approach yields higher accuracy, precision and recall compared to the traditional keyword-based information retrieval approaches.
| Original language | English |
|---|---|
| Pages (from-to) | 519-531 |
| Number of pages | 13 |
| Journal | International Journal of Innovative Computing, Information and Control |
| Volume | 12 |
| Issue number | 2 |
| State | Published - 2016 |
| Externally published | Yes |
Keywords
- Concept semantic weighting
- Documents classification
- Domain ontology
- Information extraction
- Information retrieval
Fingerprint
Dive into the research topics of 'Ontology-concepts weighting for enhanced semantic classification of documents'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver