Skip to main navigation Skip to search Skip to main content

Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering

  • Universiti Sains Malaysia
  • Al-Balqa Applied University

Research output: Contribution to journalArticlepeer-review

191 Scopus citations

Abstract

This paper proposes three feature selection algorithms with feature weight scheme and dynamic dimension reduction for the text document clustering problem. Text document clustering is a new trend in text mining; in this process, text documents are separated into several coherent clusters according to carefully selected informative features by using proper evaluation function, which usually depends on term frequency. Informative features in each document are selected using feature selection methods. Genetic algorithm (GA), harmony search (HS) algorithm, and particle swarm optimization (PSO) algorithm are the most successful feature selection methods established using a novel weighting scheme, namely, length feature weight (LFW), which depends on term frequency and appearance of features in other documents. A new dynamic dimension reduction (DDR) method is also provided to reduce the number of features used in clustering and thus improve the performance of the algorithms. Finally, k-mean, which is a popular clustering method, is used to cluster the set of text documents based on the terms (or features) obtained by dynamic reduction. Seven text mining benchmark text datasets of different sizes and complexities are evaluated. Analysis with k-mean shows that particle swarm optimization with length feature weight and dynamic reduction produces the optimal outcomes for almost all datasets tested. This paper provides new alternatives for text mining community to cluster text documents by using cohesive and informative features.

Original languageEnglish
Pages (from-to)24-36
Number of pages13
JournalExpert Systems with Applications
Volume84
DOIs
StatePublished - 30 Oct 2017
Externally publishedYes

Keywords

  • Dynamic dimension reduction
  • Feature selection
  • Metaheuristics
  • Text document clustering
  • Weight score

Fingerprint

Dive into the research topics of 'Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering'. Together they form a unique fingerprint.

Cite this