A Novel Approach to Clustering Merchandise Records
-
Abstract
Object identification is one of the major challenges inintegrating data from multiple information sources. Since being short ofglobal identifiers, it is hard to find all records referring to the sameobject in an integrated database. Traditional object identificationtechniques tend to use character-based or vector space model-basedsimilarity computing in judging, but they cannot work well inmerchandise databases. This paper brings forward a new approach toobject identification. First, we use merchandise images to judgewhether two records belong to the same object; then, we use Na\"\i ve BayesianModel to judge whether two merchandise names have similar meaning. Wedo experiments on data downloaded from shopping websites, and theresults show good performance.
-
-