• Articles • Previous Articles     Next Articles

Composite Distance Transformation for Indexing and k-Nearest-Neighbor Searching in High-Dimensional Spaces

Yi Zhuang, Yue-Ting Zhuang, and Fei Wu   

  1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
  • Received:2006-05-01 Revised:2007-01-04 Online:2007-03-10 Published:2007-03-10

Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a ``hard'' problem. In this paper, a novel {\it composite distance transformation} method, which is called CDT, is proposed to support a fast $k$-nearest-neighbor ($k$-NN) search in high-dimensional spaces. In CDT, all ($n$) data points are first grouped into some clusters by a $k$-Means clustering algorithm. Then a composite distance key of each data point is computed. Finally, these index keys of such $n$ data points are inserted by a partition-based B^+-tree. Thus, given a query point, its $k$-NN search in high-dimensional spaces is transformed into the search in the single dimensional space with the aid of CDT index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of the proposed scheme. Our results show that this method outperforms the state-of-the-art high-dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree.

Key words: Process algebra; semantics of concurrency; testing; conformance.;



[1] Christian B\"ohm, Stefan Berchtold, Daniel Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. -\it ACM Computing Surveys}, 2001, 33(3): 322--373.

[2] Guttman A. R-tree: A dynamic index structure for spatial searching. In -\it Proc. the ACM SIGMOD Int. Conf. Management Data}, Boston, USA, 1984, pp.47--54.

[3] Beckmann N, Kriegel H-P, Schneider R, Seeger B. The $R*$-tree: An efficient and robust access method for points and rectangles. In -\it Proc. ACM SIGMOD Int. Conf. Management Data}, Atlantic, USA, 1990, pp.322--331.

[4] Berchtold S, Keim D A, Kriegel H P. The X-tree: An index structure for high-dimensional data. In -\it Proc. 22nd Int. Conf. Very Large Data Bases}, India, 1996, pp.28--37.

[5] Katamaya N, Satoh S. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In -\it Proc. ACM SIGMOD Int. Conf. Management of Data}, Arizona, USA, 1997, pp.32--42.

[6] Weber R, Schek H, Blott S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In -\it Proc. 24th Int. Conf. Very Large Data Bases}, New York, USA, 1998, pp.194--205.

[7] Berchtold S, Bohm C, Kriegel H P \it et al. \rm Independent quantization: An index compression technique for high-dimensional data spaces. In -\it Proc. 16th Int. Conf. Data Engineering}, San Diego, USA, 2000, pp.577--588.

[8] Fonseca M J, Jorge J A. Indexing high-dimensional data for content-based retrieval in large databases. In -\it Proc. the 8th Int. Conf. Database Systems for Advanced Applications}, Kyoto, Japan, 2003, pp.267--274.

[9] Jagadish H V, Ooi B C, Tan K L \it et al. \rm iDistance: An adaptive B$+$-tree based indexing method for nearest neighbor search. -\it ACM Trans. Data Base Systems}, 2005, 30(2): 364--397.

[10] The UCI KDD Archive. http:// www.kdd.ics.uci.edu, 2002.
[1] Ze-Lin Zhao, Di Huang, and Xiao-Xing Ma. TOAST: Automated Testing of Object Transformers in Dynamic Software Updates [J]. Journal of Computer Science and Technology, 2022, 37(1): 50-66.
[2] Jin-Chi Chen, Yi Qin, Hui-Yan Wang, and Chang Xu. Simulation Might Change Your Results: A Comparison of Context-Aware System Input Validation in Simulated and Physical Environments [J]. Journal of Computer Science and Technology, 2022, 37(1): 83-105.
[3] Jia-Ming Zhang, Zhan-Qi Cui, Xiang Chen, Huan-Huan Wu, Li-Wei Zheng, and Jian-Bin Liu. DeltaFuzz: Historical Version Information Guided Fuzz Testing [J]. Journal of Computer Science and Technology, 2022, 37(1): 29-49.
[4] Yi-Sen Xu, Xiang-Yang Jia, Fan Wu, Lingbo Li, Ji-Feng Xuan. Automatically Identifying Calling-Prone Higher-Order Functions of Scala Programs to Assist Testers [J]. Journal of Computer Science and Technology, 2020, 35(6): 1278-1294.
[5] Ming-Zhe Zhang, Yun-Zhan Gong, Ya-Wen Wang, Da-Hai Jin. Unit Test Data Generation for C Using Rule-Directed Symbolic Execution [J]. Journal of Computer Science and Technology, 2019, 34(3): 670-689.
[6] Xu-Zhou Zhang, Yun-Zhan Gong, Ya-Wen Wang, Ying Xing, Ming-Zhe Zhang. Automated String Constraints Solving for Programs Containing String Manipulation Functions [J]. , 2017, 32(6): 1125-1135.
[7] Xiao-Fang Qi, Zi-Yuan Wang, Jun-Qiang Mao, Peng Wang. Automated Testing of Web Applications Using Combinatorial Strategies [J]. , 2017, 32(1): 199-210.
[8] Rong-Zhi Qi, Zhi-Jian Wang, Shui-Yan Li. A Parallel Genetic Algorithm Based on Spark for Pairwise Test Suite Generation [J]. , 2016, 31(2): 417-427.
[9] Xiangyu Zhang, Dongmei Zhang, Yves Le Traon, Qing Wang, Lu Zhang. Roundtable: Research Opportunities and Challenges for Emerging Software Systems [J]. , 2015, 30(5): 935-941.
[10] Shi-Wei Gao, Jiang-Hua Lv, Bing-Lei Du, Charles J. Colbourn, Shi-Long Ma. Balancing Frequencies and Fault Detection in the In-Parameter-Order Algorithm [J]. , 2015, 30(5): 957-968.
[11] Tao Xie, Lu Zhang, Xusheng Xiao, Ying-Fei Xiong, and Dan Hao. Cooperative Software Testing and Analysis:Advances and Challenges [J]. , 2014, 29(4): 713-723.
[12] Yan-Fang Ma and Min Zhang. The Infinite Evolution Mechanism of ε-Bisimilarity [J]. , 2013, 28(6): 1097-1105.
[13] You-Ming Qiao (乔友明), Jayalal Sarma M.N., and Bang-Sheng Tang (唐邦晟). On Isomorphism Testing of Groups with Normal Hall Subgroups [J]. , 2012, 27(4): 687-701.
[14] Xu-Tao Du (杜旭涛), Chun-Xiao Xing (邢春晓), Member, CCF, IEEE and Li-Zhu Zhou (周立柱), Member, ACM. Modeling and Verifying Concurrent Programs with Finite Chu Spaces [J]. , 2010, 25(6): 1168-1183.
[15] Qing Zhang, Wei Wei, and Ting Yu, Member, ACM. On the Modeling of Honest Players in Reputation Systems [J]. , 2009, 24(5): 808-819.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[2] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[3] Adelino Santos;. Cooperative Hypermedia Editing with CoMEdiA[J]. , 1993, 8(3): 67 -79 .
[4] Jin Zhi;. The Structure and Semantics of an Object-Oriented Logic Programming Language: SCKE[J]. , 1995, 10(1): 74 -84 .
[5] Tan Qingping;. A Higher-Order Unification Algorithm for Inductive Types and Dependent Types[J]. , 1997, 12(3): 231 -243 .
[6] ZHAN Yongzhao; SONG Snunlin; XIE Li;. Demand Priority Protocol Simulation and Evaluation[J]. , 1999, 14(6): 599 -605 .
[7] CHEN Yisong(陈毅松),LU Jian(卢坚),SUN Zhengxing(孙正兴)and ZHANG Fuyan(张福炎). Greylevel Difference Classification Algorithm in Fractal Image Compression[J]. , 2002, 17(2): 0 .
[8] Yongxi Cheng. Generating Combinations by Three Basic Operations[J]. , 2007, 22(6): 909 -913 .
[9] Yu-Ping Shen (沈榆平) and Xi-Shun Zhao (赵希顺). NP-Logic Systems and Model-Equivalence Reductions[J]. , 2010, 25(6): 1321 -1326 .
[10] Xin-Hai Xu (徐新海), Student Member, CCF, ACM Xue-Jun Yang (杨学军), Senior Member, CCF, Member, ACM, IEEE Jing-Ling Xue (薛京灵), Senior Member, IEEE, Member, ACM Yu-Fei Lin (林宇斐), Student Member, CCF, ACM, and Yi-Song Lin (林一松). PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs[J]. , 2012, (2): 240 -255 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved