›› 2018, Vol. 33 ›› Issue (4): 807-822.doi: 10.1007/s11390-018-1857-9

Special Issue: Data Management and Data Mining

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Hierarchical Clustering of Complex Symbolic Data and Application for Emitter Identification

Xin Xu1, Jiaheng Lu2, Wei Wang3, Member, CCF, ACM   

  1. 1 Laboratory of Science and Technology on Information System Engineering, Nanjing Research Institute of Electronics Engineering, Nanjing 210007, China;
    2 Department of Computer Science, University of Helsinki, Helsinki 00014, Finland;
    3 State Key Laboratory for Novel Software and Technology, Nanjing University, Nanjing 210046, China
  • Received:2017-03-03 Revised:2018-05-15 Online:2018-07-05 Published:2018-07-05
  • About author:Xin Xu received her Ph.D. degree in computer science from School of Computing, National University of Singapore, Singapore, in 2006. She is currently a senior research engineer in Science and Technology on Information System Engineering Laboratory in Nanjing Research Institute of Electronic Engineering, Nanjing. Her research interests are in the area of artificial intelligence, data mining, and pattern recognition.
  • Supported by:

    This work was supported by the National Natural Science Foundation of China under Grant Nos. 61771177 and 61701454, the Natural Science Foundation of Jiangsu Province of China under Grant Nos. BK20160147 and BK20160148, and the Academy Project of Finland under Grant No. 310321.

It is well-known that the values of symbolic variables may take various forms such as an interval, a set of stochastic measurements of some underlying patterns or qualitative multi-values and so on. However, the majority of existing work in symbolic data analysis still focuses on interval values. Although some pioneering work in stochastic pattern based symbolic data and mixture of symbolic variables has been explored, it still lacks flexibility and computation efficiency to make full use of the distinctive individual symbolic variables. Therefore, we bring forward a novel hierarchical clustering method with weighted general Jaccard distance and effective global pruning strategy for complex symbolic data and apply it to emitter identification. Extensive experiments indicate that our method has outperformed its peers in both computational efficiency and emitter identification accuracy.

[1] Noirhomme-Fraiture M, Brito P. Far beyond the classical data models:Symbolic data analysis. Statistical Analysis and Data Mining, 2011, 4(2):157-170.

[2] Xu X, Lu J H, Wang W. Incremental hierarchical clustering of stochastic pattern based symbolic data. In Advances in Knowledge Discovery and Data Mining, Bailey J, Khan L, Washio T et al. (eds.), Springer, 2016, pp.156-167.

[3] Yu X C, He H, Hu D, Zhou W. Land cover classification of remote sensing imagery based on interval-valued data fuzzy c-means algorithm. Science China Earth Science, 2014, 57(6):1306-1313.

[4] Lauro C, Verde R, Irpino A. Generalized canonical analysis In Symbolic Data Analysis and the SODAS Software, Diday E, Noirhomme-Fraiture M (eds.), Wiley-Interscience, 2008, pp.313-330.

[5] de Carvalho de A T F, de Souza R M C R. Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognition Letters, 2010, 31(5):430-443.

[6] Rasson J P, Pircon J Y, Lallemand P, Adans S. Unsupervised divisive classification. In Symbolic Data Analysis and the SODAS Software, Diday E, Noirhomme-Fraiture M (eds.), Wiley Interscience, 2008, pp.149-156.

[7] Neto L, de Carvalho F de A T. Constrained linear regression models for symbolic interval-valued variables. Computational Statistics & Data Analysis, 2010, 54(2):333-347.

[8] Arroyo J, González-Rivera G, Maté C. Forecasting with interval and histogram data. Some financial applications. In Handbook of Empirical Economics and Finance, Ullah A, Giles D (eds.), Chapman and Hall/CRC, 2010, pp.247-279.

[9] Xu X. A novel hierarchical clustering framework for complex symbolic data exploration. In Proc. the 32nd IEEE International Conference on Data Engineering Workshops, May 2016, pp.189-192.

[10] Diday E. The symbolic approach in clustering and related methods of data analysis:The basic choices. In Proc. the 1st Conference of the International Federation of Classification Societies (IFCS), Bock H H (ed.), North Holland, 1988, pp.673-684.

[11] Diday E. Introductionà l' approche symbolique en analyse des données. Recherche opérationnelle/Operations Research, 1989, 23(2):193-236. (in French)

[12] Diday E, Noirhomme-Fraiture M. Symbolic Data Analysis and the SODAS Software. Wiley Interscience, 2008

[13] Bock H H, Diday E. Analysis of Symbolic Data:Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, 2000.

[14] Billard L. Sample covariance functions for complex quantitative data. In Proc. the Joint Meeting of the 4th World Conference of the IASC and the 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis, December 2008, pp.157-163.

[15] Lin C M, Chen Y M, Hsueh C S. A self-organizing interval type-2 fuzzy neural network for radar emitter identification. International Journal of Fuzzy Systems, 2014, 16(1):20-30.

[16] González-Rivera G, Arroyo J. Time series modeling of histogram-valued data:The daily histogram time series of S&P500 intradaily returns. International Journal of Forecasting, 2012, 28(1):20-33.

[17] Kaytoue M, Kuznetsov S O, Napoli A. Revisiting numerical pattern mining with formal concept analysis. In Proc. the 22nd International Joint Conference on Artificial Intelligence, July 2011, pp.1342-1347.

[18] Jaccard P. The distribution of the flora in the alpine zone. The New Phytologist, 1912, 11(2):37-50.

[19] Tan P N, Steinbach M, Kumar V. Introduction to Data Mining (1st edition). Pearson, 2005.

[20] Wang L, Cheung W L D, Cheng R, Lee S D, Yang X S. Efficient mining of frequent item sets on large uncertain databases. IEEE Transactions on Knowledge & Data Engineering, 2012, 24(12):2170-2183.

[21] Tong Y X, Chen L, Cheng Y, Yu P S. Mining frequent itemsets over uncertain databases. Proceeding of the VLDB Endowment, 2012, 5(11):1650-1661.

[22] Singh S K, Wayal G, Sharma N. A review:Data mining with fuzzy association rule mining. International Journal of Engineering Research & Technology, 2012, 1(5):1-4.

[23] Prabha K S, Lawrance R. Mining fuzzy frequent item set using compact frequent pattern (CFP) tree algorithm. Data Mining and Knowledge Engineering 2012, 4(7):365-369.

[24] Johnson S C. Hierarchical clustering schemes. Psychometrika, 1967, 32(3):241-254.

[25] Karypis G, Han E H, Kumar V. CHAMELEON:A hierarchical clustering algorithm using dynamic modeling. Computer, 1999, 32(8):68-75.

[26] Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M. Algorithms for processing K-closest-pair queries in spatial databases. Data & Knowledge Engineering, 2004, 49(1):67-104.

[27] Guttman A. R-trees:A dynamic index structure for spatial searching. In Proc. the 1984 ACM SIGMOD International Conference on Management of Data, June 1984, pp.47-57.

[28] Ibaraki T. Annals of Operations Research. Springer Verlag, 1987.

[29] Xiao C, Wang W, Lin X M, Yu J X, Wang G R. Efficient similarity joins for near-duplicate detection. ACM Transactions on Database Systems, 2011, 36(3):Article No. 15.

[30] Sun T Y, Shu C C, Li F, Yu H Y, Ma L L, Fang Y T. An efficient hierarchical clustering method for large datasets with MapReduce. In Proc. the International Conference on Parallel and Distributed Computing, Applications and Technologies, December 2009, pp.494-499.

[31] Bruynooghe M. Recent results in hierarchical clustering:Ithe reducible neighborhoods clustering algorithm. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7(3):541-571.

[32] Siegfried K. Multivariate tests based on pairwise distance or similarity measures. In Proc. the 6th Conference on Multivariate Distributions with Fixed Marginals, June 2007.
No related articles found!
Full text



[1] Li Wei;. A Structural Operational Semantics for an Edison Like Language(2)[J]. , 1986, 1(2): 42 -53 .
[2] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[3] Feng Yulin;. Recursive Implementation of VLSI Circuits[J]. , 1986, 1(2): 72 -82 .
[4] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] Sun Zhongxiu; Shang Lujun;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[6] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[7] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[8] Jin Lan; Yang Yuanyuan;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[9] Pan Qijing;. A Routing Algorithm with Candidate Shortest Path[J]. , 1986, 1(3): 33 -52 .
[10] Wu Enhua;. A Graphics System Distributed across a Local Area Network[J]. , 1986, 1(3): 53 -64 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved