›› 2017, Vol. 32 ›› Issue (1): 68-77.doi: 10.1007/s11390-017-1706-2

Special Issue: Artificial Intelligence and Pattern Recognition

• Data Management and Data Mining • Previous Articles     Next Articles

Sparse Support Vector Machine with Lp Penalty for Feature Selection

Lan Yao1, Feng Zeng2,*, Member, CCF, Dong-Hui Li3, and Zhi-Gang Chen2, Senior Member, CCF   

  1. 1 College of Mathematics and Econometrics, Hunan University, Changsha 410082, China;
    2 School of Software, Central South University, Changsha 410083, China;
    3 School of Mathematical Sciences, South China Normal University, Guangzhou 510631, China
  • Received:2016-02-28 Revised:2016-09-07 Online:2017-01-05 Published:2017-01-05
  • Contact: Feng Zeng E-mail:fengzeng@csu.edu.cn
  • About author:Lan Yao is an assistant professor of the College of Mathematics and Econometrics, Hunan University, Changsha. She got her B.S. degree in computer science, M.S. and Ph.D. degrees in applied mathematics from Hunan University, Changsha, in 2000, 2006 and 2014 respectively. Her research interests include data mining, numerical methods in optimization and network optimization.
  • Supported by:

    This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61502159, 61379057, 11101081, and 11271069, and the Research Foundation of Central South University of China under Grant No. 2014JSJJ019.

We study the strategies in feature selection with sparse support vector machine (SVM). Recently, the socalled L p-SVM (0< p< 1) has attracted much attention because it can encourage better sparsity than the widely used L1-SVM. However, Lp-SVM is a non-convex and non-Lipschitz optimization problem. Solving this problem numerically is challenging. In this paper, we reformulate the Lp-SVM into an optimization model with linear objective function and smooth constraints (LOSC-SVM) so that it can be solved by numerical methods for smooth constrained optimization. Our numerical experiments on artificial datasets show that LOSC-SVM (0< p< 1) can improve the classification performance in both feature selection and classification by choosing a suitable parameter p. We also apply it to some real-life datasets and experimental results show that it is superior to L1-SVM.

[1] Vapnik V N. The Nature of Statistical Learning Theory (2nd edition). Springer, 2000.

[2] Guyon I, Gunn S, Nikravesh M, Zadeh L A. Feature Extraction:Foundations and Applications (1st edition). Springer, 2006.

[3] Saeys Y, Inza I, Larranagal P. A review of feature selection techniques in bioinformatics. Bioinformatics, 2007, 23(19):2507-2517.

[4] Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46(1/2/3):389-422.

[5] Rakotomamonjy A. Variable selection using SVM based criteria. The Journal of Machine Learning Research, 2003, 3:1357-1370.

[6] Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V. Feature selection for SVMs. In Advances in Neural Information Processing Systems 13, Leen T K, Dietterich T G, Tresp V (eds.), Massachusetts Institute of Technology, 2001, pp.668-674.

[7] Peleg D, Meir R. A feature selection algorithm based on the global minimization of a generalization error bound. In Advances in Neural Information Processing Systems 17, Saul L K, Weiss Y, Bottou L (eds.), Massachusetts Institute of Technology, 2005, pp.1065-1072.

[8] Bradley P S, Mangasarian O L. Feature selection via concave minimization and support vector machines. In Proc. the 5th International Conference on Machine Learning, July 1998, pp.82-90.

[9] Weston J, Elisseeff A, Schölkopf B, Tipping M. Use of the zero norm with linear models and kernel methods. The Journal of Machine Learning Research, 2003, 3:1439-1461.

[10] Amaldi E, Kann V. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 1998, 209(1/2):237-260.

[11] Chan A B, Vasconcelos N, Lanckriet G R G. Direct convex relaxations of sparse SVM. In Proc. the 24th International Conference on Machine Learning, June 2007, pp.145-153.

[12] Fung G M, Mangasarian O L. A feature selection newton method for support vector machine classification. Computational Optimization and Applications, 2004, 28(2):185-202.

[13] Bi J B, Bennett K, Embrechts M, Breneman C, Song M H. Dimensionality reduction via sparse support vector machines. The Journal of Machine Learning Research, 2003, 3:1229-1243.

[14] Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 1996, 58(1):267-288.

[15] Neumann J, Schnörr C, Steidl G. Combined SVM-based feature selection and classification. Machine Learning, 2005, 61(1/2/3):129-150.

[16] Chartrand R. Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters, 2007, 14(10):707-710.

[17] Chartrand R. Nonconvex regularization for shape preservation. In Proc. the IEEE International Conference on Image Processing, September 16-October 19, 2007, pp.293-296.

[18] Xu Z B, Zhang H, Wang Y, Chang X Y, Liang Y. L1/2 regularization. Science China Information Sciences, 2010, 53(6):1159-1169.

[19] Liu J L, Li J P, Xu W X, Shi Y. A weighted Lq adaptive least squares support vector machine classifiers-Robust and sparse approximation. Expert Systems with Applications, 2011, 38(3):2253-2259.

[20] Chen W J, Tian Y J. Lp-norm proximal support vector machine and its applications. Procedia Computer Science, 2010, 1(1):2417-2423.

[21] Rakotomamonjy A, Flamary R, Gasso G, Canu S. lp-lq penalty for sparse linear and sparse multiple kernel multitask learning. IEEE Transactions on Neural Networks, 2011, 22(8):1307-1320.

[22] Liu Y F, Zhang H H, Park C, Ahn J. Support vector machines with adaptive Lq penalty. Computational Statistics and Data Analysis, 2007, 51(12):6380-6394.

[23] Liu Z Q, Lin S L, Tan M. Sparse support vector machines with L p penalty for biomarker identification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2010, 7(1):100-107.

[24] Tan J Y, Zhang Z Q, Zhen L, Zhang C H, Deng N Y. Adaptive feature selection via a new version of support vector machine. Neural Computing and Applications, 2013, 23(3/4):937-945.

[25] Tian Y J, Yu J, Chen W J. lp-norm support vector machine with CCCP. In Proc. the 7th International Conference on Fuzzy Systems and Knowledge Discovery, August 2010, pp.1560-1564.

[26] Liu J W, Liu Y. Non-integer norm regularization SVM via Legendre-Fenchel duality. Neurocomputing, 2014, 144:537-545.

[27] Chen X J, Xu F M, Ye Y Y. Lower bound theory of nonzero entries in solutions of l2-lp minimization. SIAM J. Sci. Comput., 2010, 32(5):2832-2852.

[28] Zhang C H, Shao Y H, Tan J Y, Deng N Y. Mixed-norm linear support vector machine. Neural Computing and Applications, 2013, 23(7):2159-2166.

[29] Li D H, Wu L, Sun Z, Zhang X J. A constrained optimization reformulation and a feasible descent direction method for L1/2 regularization. Computational Optimization and Applications, 2014, 59(1/2):263-284.

[30] Newman D J, Hettich S, Blake C L, Merz C J. UCI repository of machine learning databases. Technical Report 9702, Department of Information and Computer Science, University of California, Irvine, 1998. http://archive.ics.uci.edu/ml/, Nov. 2016
No related articles found!
Full text



[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved