›› 2018,Vol. 33 ›› Issue (4): 807-822.doi: 10.1007/s11390-018-1857-9

所属专题: Data Management and Data Mining

• Artificial Intelligence and Pattern Recognition • 上一篇    下一篇

一种面向复杂符号数据的分层聚类方法及其在辐射源识别上的应用

Xin Xu1, Jiaheng Lu2, Wei Wang3, Member, CCF, ACM   

  1. 1 Laboratory of Science and Technology on Information System Engineering, Nanjing Research Institute of Electronics Engineering, Nanjing 210007, China;
    2 Department of Computer Science, University of Helsinki, Helsinki 00014, Finland;
    3 State Key Laboratory for Novel Software and Technology, Nanjing University, Nanjing 210046, China
  • 收稿日期:2017-03-03 修回日期:2018-05-15 出版日期:2018-07-05 发布日期:2018-07-05
  • 作者简介:Xin Xu received her Ph.D. degree in computer science from School of Computing, National University of Singapore, Singapore, in 2006. She is currently a senior research engineer in Science and Technology on Information System Engineering Laboratory in Nanjing Research Institute of Electronic Engineering, Nanjing. Her research interests are in the area of artificial intelligence, data mining, and pattern recognition.
  • 基金资助:

    This work was supported by the National Natural Science Foundation of China under Grant Nos. 61771177 and 61701454, the Natural Science Foundation of Jiangsu Province of China under Grant Nos. BK20160147 and BK20160148, and the Academy Project of Finland under Grant No. 310321.

Hierarchical Clustering of Complex Symbolic Data and Application for Emitter Identification

Xin Xu1, Jiaheng Lu2, Wei Wang3, Member, CCF, ACM   

  1. 1 Laboratory of Science and Technology on Information System Engineering, Nanjing Research Institute of Electronics Engineering, Nanjing 210007, China;
    2 Department of Computer Science, University of Helsinki, Helsinki 00014, Finland;
    3 State Key Laboratory for Novel Software and Technology, Nanjing University, Nanjing 210046, China
  • Received:2017-03-03 Revised:2018-05-15 Online:2018-07-05 Published:2018-07-05
  • About author:Xin Xu received her Ph.D. degree in computer science from School of Computing, National University of Singapore, Singapore, in 2006. She is currently a senior research engineer in Science and Technology on Information System Engineering Laboratory in Nanjing Research Institute of Electronic Engineering, Nanjing. Her research interests are in the area of artificial intelligence, data mining, and pattern recognition.
  • Supported by:

    This work was supported by the National Natural Science Foundation of China under Grant Nos. 61771177 and 61701454, the Natural Science Foundation of Jiangsu Province of China under Grant Nos. BK20160147 and BK20160148, and the Academy Project of Finland under Grant No. 310321.

众所周知,符号数据变量可以多种形式出现,如数值区间、若干随机测量值组成的集合或若干离散值组成的集合。目前,大多数符号数据分析仍局限于区间型数据分析。尽管在随机测量值集合与混合符号数据分析方面有一些预先开展的工作,然而由于缺乏灵活和高效的混合符号数据处理机制,难以充分利用所有的符号数据变量。因此,本文创新性地提出一种采用加权Jaccard距离和全维度剪枝策略的复杂符号数据分层聚类方法,并结合辐射源识别应用验证了方法的可行性。大量实验表明我们的方法在计算效率和识别准确率方面优于其他符号数据方法。

Abstract: It is well-known that the values of symbolic variables may take various forms such as an interval, a set of stochastic measurements of some underlying patterns or qualitative multi-values and so on. However, the majority of existing work in symbolic data analysis still focuses on interval values. Although some pioneering work in stochastic pattern based symbolic data and mixture of symbolic variables has been explored, it still lacks flexibility and computation efficiency to make full use of the distinctive individual symbolic variables. Therefore, we bring forward a novel hierarchical clustering method with weighted general Jaccard distance and effective global pruning strategy for complex symbolic data and apply it to emitter identification. Extensive experiments indicate that our method has outperformed its peers in both computational efficiency and emitter identification accuracy.

[1] Noirhomme-Fraiture M, Brito P. Far beyond the classical data models:Symbolic data analysis. Statistical Analysis and Data Mining, 2011, 4(2):157-170.

[2] Xu X, Lu J H, Wang W. Incremental hierarchical clustering of stochastic pattern based symbolic data. In Advances in Knowledge Discovery and Data Mining, Bailey J, Khan L, Washio T et al. (eds.), Springer, 2016, pp.156-167.

[3] Yu X C, He H, Hu D, Zhou W. Land cover classification of remote sensing imagery based on interval-valued data fuzzy c-means algorithm. Science China Earth Science, 2014, 57(6):1306-1313.

[4] Lauro C, Verde R, Irpino A. Generalized canonical analysis In Symbolic Data Analysis and the SODAS Software, Diday E, Noirhomme-Fraiture M (eds.), Wiley-Interscience, 2008, pp.313-330.

[5] de Carvalho de A T F, de Souza R M C R. Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognition Letters, 2010, 31(5):430-443.

[6] Rasson J P, Pircon J Y, Lallemand P, Adans S. Unsupervised divisive classification. In Symbolic Data Analysis and the SODAS Software, Diday E, Noirhomme-Fraiture M (eds.), Wiley Interscience, 2008, pp.149-156.

[7] Neto L, de Carvalho F de A T. Constrained linear regression models for symbolic interval-valued variables. Computational Statistics & Data Analysis, 2010, 54(2):333-347.

[8] Arroyo J, González-Rivera G, Maté C. Forecasting with interval and histogram data. Some financial applications. In Handbook of Empirical Economics and Finance, Ullah A, Giles D (eds.), Chapman and Hall/CRC, 2010, pp.247-279.

[9] Xu X. A novel hierarchical clustering framework for complex symbolic data exploration. In Proc. the 32nd IEEE International Conference on Data Engineering Workshops, May 2016, pp.189-192.

[10] Diday E. The symbolic approach in clustering and related methods of data analysis:The basic choices. In Proc. the 1st Conference of the International Federation of Classification Societies (IFCS), Bock H H (ed.), North Holland, 1988, pp.673-684.

[11] Diday E. Introductionà l' approche symbolique en analyse des données. Recherche opérationnelle/Operations Research, 1989, 23(2):193-236. (in French)

[12] Diday E, Noirhomme-Fraiture M. Symbolic Data Analysis and the SODAS Software. Wiley Interscience, 2008

[13] Bock H H, Diday E. Analysis of Symbolic Data:Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, 2000.

[14] Billard L. Sample covariance functions for complex quantitative data. In Proc. the Joint Meeting of the 4th World Conference of the IASC and the 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis, December 2008, pp.157-163.

[15] Lin C M, Chen Y M, Hsueh C S. A self-organizing interval type-2 fuzzy neural network for radar emitter identification. International Journal of Fuzzy Systems, 2014, 16(1):20-30.

[16] González-Rivera G, Arroyo J. Time series modeling of histogram-valued data:The daily histogram time series of S&P500 intradaily returns. International Journal of Forecasting, 2012, 28(1):20-33.

[17] Kaytoue M, Kuznetsov S O, Napoli A. Revisiting numerical pattern mining with formal concept analysis. In Proc. the 22nd International Joint Conference on Artificial Intelligence, July 2011, pp.1342-1347.

[18] Jaccard P. The distribution of the flora in the alpine zone. The New Phytologist, 1912, 11(2):37-50.

[19] Tan P N, Steinbach M, Kumar V. Introduction to Data Mining (1st edition). Pearson, 2005.

[20] Wang L, Cheung W L D, Cheng R, Lee S D, Yang X S. Efficient mining of frequent item sets on large uncertain databases. IEEE Transactions on Knowledge & Data Engineering, 2012, 24(12):2170-2183.

[21] Tong Y X, Chen L, Cheng Y, Yu P S. Mining frequent itemsets over uncertain databases. Proceeding of the VLDB Endowment, 2012, 5(11):1650-1661.

[22] Singh S K, Wayal G, Sharma N. A review:Data mining with fuzzy association rule mining. International Journal of Engineering Research & Technology, 2012, 1(5):1-4.

[23] Prabha K S, Lawrance R. Mining fuzzy frequent item set using compact frequent pattern (CFP) tree algorithm. Data Mining and Knowledge Engineering 2012, 4(7):365-369.

[24] Johnson S C. Hierarchical clustering schemes. Psychometrika, 1967, 32(3):241-254.

[25] Karypis G, Han E H, Kumar V. CHAMELEON:A hierarchical clustering algorithm using dynamic modeling. Computer, 1999, 32(8):68-75.

[26] Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M. Algorithms for processing K-closest-pair queries in spatial databases. Data & Knowledge Engineering, 2004, 49(1):67-104.

[27] Guttman A. R-trees:A dynamic index structure for spatial searching. In Proc. the 1984 ACM SIGMOD International Conference on Management of Data, June 1984, pp.47-57.

[28] Ibaraki T. Annals of Operations Research. Springer Verlag, 1987.

[29] Xiao C, Wang W, Lin X M, Yu J X, Wang G R. Efficient similarity joins for near-duplicate detection. ACM Transactions on Database Systems, 2011, 36(3):Article No. 15.

[30] Sun T Y, Shu C C, Li F, Yu H Y, Ma L L, Fang Y T. An efficient hierarchical clustering method for large datasets with MapReduce. In Proc. the International Conference on Parallel and Distributed Computing, Applications and Technologies, December 2009, pp.494-499.

[31] Bruynooghe M. Recent results in hierarchical clustering:Ithe reducible neighborhoods clustering algorithm. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7(3):541-571.

[32] Siegfried K. Multivariate tests based on pairwise distance or similarity measures. In Proc. the 6th Conference on Multivariate Distributions with Fixed Marginals, June 2007.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李未;. A Structural Operational Semantics for an Edison Like Language(2)[J]. , 1986, 1(2): 42 -53 .
[2] 李万学;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[3] 冯玉琳;. Recursive Implementation of VLSI Circuits[J]. , 1986, 1(2): 72 -82 .
[4] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] 孙钟秀; 商陆军;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[6] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[7] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[8] 金兰; 杨元元;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[9] 潘启敬;. A Routing Algorithm with Candidate Shortest Path[J]. , 1986, 1(3): 33 -52 .
[10] 吴恩华;. A Graphics System Distributed across a Local Area Network[J]. , 1986, 1(3): 53 -64 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: