Multiple Hypergraph Clustering ofWeb Images by MiningWord2Image Correlations
-
Abstract
In this paper, we consider the problem of clustering Web images by mining correlations between images and their corresponding words. Since Web images always come with associated text, the corresponding textual tags of Web images are used as a source to enhance the description of Web images. However, each word has different contribution for the interpretation of image semantics. Therefore, in order to evaluate the importance of each corresponding word of Web images, we propose a novel visibility model to compute the extent to which a word can be perceived visually in images, and then infer the correlation of word to image by the integration of visibility with tf-idf. Furthermore, Latent Dirichlet Allocation (LDA) is used to discover topic information inherent in surrounding text and topic correlations of images could be defined for image clustering. For integrating visibility and latent topic information into an image clustering framework, we first represent textual correlated and latent-topic correlated images by two hypergraph views, and then the proposed Spectral Multiple Hypergraph Clustering (SMHC) algorithm is used to cluster images into categories. The SMHC could be regarded as a new unsupervised learning process with two hypergraphs to classify Web images. Experimental results show that the SMHC algorithm has better clustering performance and the proposed SMHC-based image clustering framework is effective.
-
-