Header menu link for other important links
A Grey Wolf Optimizer for Text Document Clustering
H. Rashaideh, A. Sawaie, , L.M. Abualigah, M.M. Al-Laham, R.M. Al-Khatib, M. Braik
Published in De Gruyter
Volume: 29
Issue: 1
Pages: 814 - 830
Text clustering problem (TCP) is a leading process in many key areas such as information retrieval, text mining, and natural language processing. This presents the need for a potent document clustering algorithm that can be used effectively to navigate, summarize, and arrange information to congregate large data sets. This paper encompasses an adaptation of the grey wolf optimizer (GWO) for TCP, referred to as TCP-GWO. The TCP demands a degree of accuracy beyond that which is possible with metaheuristic swarm-based algorithms. The main issue to be addressed is how to split text documents on the basis of GWO into homogeneous clusters that are sufficiently precise and functional. Specifically, TCP-GWO, or referred to as the document clustering algorithm, used the average distance of documents to the cluster centroid (ADDC) as an objective function to repeatedly optimize the distance between the clusters of the documents. The accuracy and efficiency of the proposed TCP-GWO was demonstrated on a sufficiently large number of documents of variable sizes, documents that were randomly selected from a set of six publicly available data sets. Documents of high complexity were also included in the evaluation process to assess the recall detection rate of the document clustering algorithm. The experimental results for a test set of over a part of 1300 documents showed that failure to correctly cluster a document occurred in less than 20% of cases with a recall rate of more than 65% for a highly complex data set. The high F-measure rate and ability to cluster documents in an effective manner are important advances resulting from this research. The proposed TCP-GWO method was compared to the other well-established text clustering methods using randomly selected data sets. Interestingly, TCP-GWO outperforms the comparative methods in terms of precision, recall, and F-measure rates. In a nutshell, the results illustrate that the proposed TCP-GWO is able to excel compared to the other comparative clustering methods in terms of measurement criteria, whereby more than 55% of the documents were correctly clustered with a high level of accuracy. © 2020 Walter de Gruyter GmbH, Berlin/Boston.
About the journal
JournalData powered by TypesetJournal of Intelligent Systems
PublisherData powered by TypesetDe Gruyter