[期刊论文][Full-length article]


Novel term weighting schemes for document representation based on ranking of terms and Fuzzy logic with semantic relationship of terms

作   者:
R. Lakshmi;S. Baskar;

出版年:2019

页     码:493 - 503
出版社:Elsevier BV


摘   要:

Weighting and normalization are the most important factor that may affect the text representation significantly. This paper presents two novel term weighting schemes to represent text documents, namely, i). Term-weighting scheme for document representation based on Term Frequency - Ranking of Term Frequency (TF-RTF) and ii). Term-weighting scheme for document representation based on Term Frequency - Ranking of fuzzy logic with semantic relationship of terms (TF-RFST). The ranking of each term in a document provides its priority of the document and uses these priorities for document representation in TF-RTF. In TF-RFST, each term is represented based on its frequency and the frequency of semantic related terms for that term. Hence, the ranking of each term is based on the combined frequencies of the term and its semantic related terms with a specific weighting scheme. With appropriate weighting schemes such as TF-RFT and TF-RFST, the proposed methods provide better clustering performance in terms of accuracy, entropy, recall and F-Measure than previously suggested methods, such as word count, Term Frequency-Inverse Document Frequency (TF-IDF), Term Frequency-Inverse Corpus Frequency (TF-ICF), Multi Aspect TF (MATF), BM25 and BM25F. Experiments carried out on the Reuters-8, Reuters-52 and WebKB data sets with K-means and K-means++ clustering algorithms for demonstrate the effectiveness of the proposed term weighting schemes.



关键字:

Document representation ; Document clustering ; Term weighting ; F-Measure ; Entropy ; K-means


所属期刊
Expert Systems with Applications
ISSN: 0957-4174
来自:Elsevier BV