TY - GEN
T1 - Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering
AU - Abualigah, Laith Mohammad
AU - Khader, Ahamad Tajudin
AU - Al-Betar, Mohammed Azmi
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/8/23
Y1 - 2016/8/23
N2 - The increasing amount of text documents in digital forms affect the text analysis techniques. Text clustering (TC) is one of the important techniques used for showing a massive amount of text documents by clusters. Hence, the main problem that affects the text clustering technique is the presence sparse and uninformative features on the text documents. The feature selection (FS) is an essential unsupervised learning technique. This technique is used to select informative features to improve the performance of text clustering algorithm. Recently, the meta-heuristic algorithms are successfully applied to solve several hard optimization problems. In this paper, we proposed the genetic algorithm (GA) to solve the unsupervised feature selection problem, namely, (FSGATC). This method is used to create a new subset of informative features in order to obtain more accurate clusters. Experiments were conducted using four benchmark text datasets with variant characteristics. The results showed that the proposed FSGATC is improved the performance of the text clustering algorithm and got better results compared with k-mean clustering standalone. Finally, the proposed method 'FSGATC' evaluated by F-measure and Accuracy, which are common measures used in the domain of text clustering.
AB - The increasing amount of text documents in digital forms affect the text analysis techniques. Text clustering (TC) is one of the important techniques used for showing a massive amount of text documents by clusters. Hence, the main problem that affects the text clustering technique is the presence sparse and uninformative features on the text documents. The feature selection (FS) is an essential unsupervised learning technique. This technique is used to select informative features to improve the performance of text clustering algorithm. Recently, the meta-heuristic algorithms are successfully applied to solve several hard optimization problems. In this paper, we proposed the genetic algorithm (GA) to solve the unsupervised feature selection problem, namely, (FSGATC). This method is used to create a new subset of informative features in order to obtain more accurate clusters. Experiments were conducted using four benchmark text datasets with variant characteristics. The results showed that the proposed FSGATC is improved the performance of the text clustering algorithm and got better results compared with k-mean clustering standalone. Finally, the proposed method 'FSGATC' evaluated by F-measure and Accuracy, which are common measures used in the domain of text clustering.
KW - Genetic Algorithm
KW - Informative features
KW - K-mean Text Clustering
KW - Sparse features
KW - Unsupervised Feature Selection
UR - https://www.scopus.com/pages/publications/84987678289
U2 - 10.1109/CSIT.2016.7549453
DO - 10.1109/CSIT.2016.7549453
M3 - Conference contribution
AN - SCOPUS:84987678289
T3 - Proceedings - CSIT 2016: 2016 7th International Conference on Computer Science and Information Technology
BT - Proceedings - CSIT 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Conference on Computer Science and Information Technology, CSIT 2016
Y2 - 13 July 2016 through 14 July 2016
ER -