We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Cao YH, Wu JX. Random subspace sampling for classification with missing data. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(2): 472−486 Mar. 2024. DOI: 10.1007/s11390-023-1611-9.
Citation: Cao YH, Wu JX. Random subspace sampling for classification with missing data. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(2): 472−486 Mar. 2024. DOI: 10.1007/s11390-023-1611-9.

Random Subspace Sampling for Classification with Missing Data

Funds: This work was supported by the National Natural Science Foundation of China under Grant Nos. 61772256 and 61921006.
More Information
  • Author Bio:

    Yun-Hao Cao is currently a Ph.D. candidate in the Department of Computer Science and Technology in Nanjing University, Nanjing. He received his B.S. degree in computer science and technology from Nanjing University, Nanjing, in 2018. His research interests are computer vision and machine learning

    Jian-Xin Wu is currently a professor in the School of Artificial Intelligence at Nanjing University, Nanjing, and is associated with the State Key Laboratory for Novel Software Technology, Nanjing. He received his B.S. and M.S. degrees from Nanjing University, Nanjing, in 1999 and 2002 respectively, and his Ph.D. degree from the Georgia Institute of Technology, Atlanta, in 2009, all in computer science. He has served as a senior area chair for CVPR, ICCV, ECCV, AAAI and IJCAI, and as an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence. His research interests are computer vision and machine learning

  • Corresponding author:

    wujx2001@nju.edu.cn

  • Received Date: May 25, 2021
  • Accepted Date: February 03, 2023
  • Many real-world datasets suffer from the unavoidable issue of missing values, and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large errors. In this paper, we propose a random subspace sampling method, RSS, by sampling missing items from the corresponding feature histogram distributions in random subspaces, which is effective and efficient at different levels of missing data. Unlike most established approaches, RSS does not train on fixed imputed datasets. Instead, we design a dynamic training strategy where the filled values change dynamically by resampling during training. Moreover, thanks to the sampling strategy, we design an ensemble testing strategy where we combine the results of multiple runs of a single model, which is more efficient and resource-saving than previous ensemble methods. Finally, we combine these two strategies with the random subspace method, which makes our estimations more robust and accurate. The effectiveness of the proposed RSS method is well validated by experimental studies.

  • [1]
    García-Laencina P J, Sancho-Gómez J L, Figueiras-Vidal A R. Pattern classification with missing data: A review. Neural Computing and Applications, 2010, 19(2): 263–282. DOI: 10.1007/s00521-009-0295-6.
    [2]
    White I R, Royston P, Wood A M. Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine, 2011, 30(4): 377–399. DOI: 10.1002/ sim.4067.
    [3]
    Farhangfar A, Kurgan L A, Pedrycz W. A novel framework for imputation of missing values in databases. IEEE Trans. Systems, Man, and Cybernetics—Part A: Systems and Humans, 2007, 37(5): 692–709. DOI: 10.1109/TSMCA.2007.902631.
    [4]
    Juszczak P, Duin R P W. Combining one-class classifiers to classify missing data. In Proc. the 5th International Workshop on Multiple Classifier Systems, Jun. 2004, pp.92–101. DOI: 10.1007/978-3-540-25966-4_9.
    [5]
    Krause S, Polikar R. An ensemble of classifiers approach for the missing feature problem. In Proc. the 2003 International Joint Conference on Neural Networks, Jul. 2003, pp.553–558. DOI: 10.1109/IJCNN.2003.1223406.
    [6]
    Polikar R, DePasquale J, Syed Mohammed H, Brown G, Kuncheva L I. Learn++. MF: A random subspace approach for the missing feature problem. Pattern Recognition, 2010, 43(11): 3817–3832. DOI: 10.1016/j.patcog.2010.05.028.
    [7]
    Ghahramani Z, Jordan M I. Supervised learning from incomplete data via an EM approach. In Proc. the 6th International Conference on Neural Information Processing Systems, Nov. 1993, pp.120–127.
    [8]
    Ahmad S, Tresp V. Some solutions to the missing feature problem in vision. In Proc. the 5th International Conference on Neural Information Processing Systems, Nov. 1992, pp.393–400.
    [9]
    Salzberg S L. Bookreview: C4.5: Programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Machine Learning, 1994, 16(3): 235–240. DOI: 10.1007/BF00993309.
    [10]
    Batista G E, Monard M C. A study of k-nearest neighbour as an imputation method. Hybrid Intelligent Systems, 2002, 87(48): 251–260. DOI: 10.1109/METRIC.2004.1357895.
    [11]
    Schafer J L. Analysis of Incomplete Multivariate Data (1st edition). CRC Press, 1997. DOI: 10.1201/9780367803025.
    [12]
    Zhao Y X, Udell M. Missing value imputation for mixed data via Gaussian copula. In Proc. the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2020, pp.636–646. DOI: 10.1145/3394486.3403106.
    [13]
    Rubin D B. Multiple Imputation for Nonresponse in Surveys (1st edition). John Wiley & Sons, Inc., 2004.
    [14]
    Houari R, Bounceur A, Tari A K, Kecha M T. Handling missing data problems with sampling methods. In Proc. the 2014 International Conference on Advanced Networking Distributed Systems and Applications, Jun. 2014, pp.99–104. DOI: 10.1109/INDS.2014.25.
    [15]
    Stekhoven D J, Bühlmann P. MissForest—Non-parametric missing value imputation for mixed-type data. Bioinformatics, 2012, 28(1): 112–118. DOI: 10.1093/bioinformatics/btr597.
    [16]
    Zhou Z H. Ensemble Methods: Foundations and Algorithms (1st edition). CRC Press, 2012. DOI: 10.1201/b12207.
    [17]
    Ho T K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Analysis and Machine Intelligence, 1998, 20(8): 832–844. DOI: 10.1109/34.709601.
    [18]
    Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32. DOI: 10.1023/A:1010933404324.
    [19]
    Sharpe P K, Solly R J. Dealing with missing values in neural network-based diagnostic systems. Neural Computing & Applications, 1995, 3(2): 73–77. DOI: 10.1007/BF 01421959.
    [20]
    Jiang K, Chen H X, Yuan S M. Classification for incomplete data using classifier ensembles. In Proc. the 2005 International Conference on Neural Networks and Brain, Apr. 2005, pp.559–563. DOI: 10.1109/ICNNB.2005.1614675.
    [21]
    Cao Y H, Wu J X, Wang H C, Lasenby J. Neural random subspace. Pattern Recognition, 2021, 112: Article No. 107801. DOI: 10.1016/j.patcog.2020.107801.
    [22]
    Little R J A, Rubin D B. Statistical Analysis with Missing Data (3rd edition). John Wiley & Sons, Inc., 2019.
    [23]
    Mazumder R, Hastie T, Tibshirani R. Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research, 2010, 11(80): 2287–2322.
    [24]
    Huang S J, Xu M, Xie M K, Sugiyama M, Niu G, Chen S C. Active feature acquisition with supervised matrix completion. In Proc. the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul. 2018, pp.1571–1579. DOI: 10.1145/3219819.3220084.
    [25]
    Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.448–456.
    [26]
    Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.
  • Related Articles

    [1]Quentin Pajon, Swan Serre, Hugo Wissocq, Léo Rabaud, Siba Haidar, Antoun Yaacoub. Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures[J]. Journal of Computer Science and Technology, 2024, 39(5): 1029-1039. DOI: 10.1007/s11390-024-3702-7
    [2]Xiao-Nan Fang, Song-Hai Zhang. Learning Local Contrast for Crisp Edge Detection[J]. Journal of Computer Science and Technology, 2023, 38(3): 554-566. DOI: 10.1007/s11390-023-3101-5
    [3]Xin-Feng Wang, Xiang Zhou, Jia-Hua Rao, Zhu-Jin Zhang, Yue-Dong Yang. Imputing DNA Methylation by Transferred Learning Based Neural Network[J]. Journal of Computer Science and Technology, 2022, 37(2): 320-329. DOI: 10.1007/s11390-021-1174-6
    [4]Zhi-Hua Zhou. Multi-Instance Learning from Supervised View[J]. Journal of Computer Science and Technology, 2006, 21(5): 800-809.
    [5]Xin Geng, Zhi-Hua Zhou. Image Region Selection and Ensemble for Face Recognition[J]. Journal of Computer Science and Technology, 2006, 21(1): 116-125.
    [6]Zhi-Hua Zhou, Yang Yu. Adapt Bagging to Nearest Neighbor Classifiers[J]. Journal of Computer Science and Technology, 2005, 20(1).
    [7]Zhou Jingzhou. A Neural Network Model Based on Logical Operations[J]. Journal of Computer Science and Technology, 1998, 13(5): 464-470.
    [8]Qin Kaihuai. Neural Network Methods for NURBS Curve and Surface Interpolation[J]. Journal of Computer Science and Technology, 1997, 12(1): 76-89.
    [9]Zhou Yi, Wu ShiLin. NNF and NNPrF—Fuzzy Petri Nets Based on Neural Network for Knowledge Representation, Reasoning and Learning[J]. Journal of Computer Science and Technology, 1996, 11(2): 133-149.
    [10]Yao Shu, Zhang Bo. The Learning Convergence of CMAC in Cyclic Learning[J]. Journal of Computer Science and Technology, 1994, 9(4): 320-328.
  • Others

  • Cited by

    Periodical cited type(1)

    1. Farzaneh Asadzadeh, Akram Reza, Midia Reshadi, et al. Thermal-aware application mapping using genetic and fuzzy logic techniques for minimizing temperature in three-dimensional network-on-chip. The Journal of Supercomputing, 2024. DOI:10.1007/s11227-023-05869-x

    Other cited types(0)

Catalog

    Article views (211) PDF downloads (32) Cited by(1)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return