›› 2012, Vol. 27 ›› Issue (6): 1289-1301.doi: 10.1007/s11390-012-1305-1

Special Issue: Artificial Intelligence and Pattern Recognition

• Machine Learning and Data Mining • Previous Articles     Next Articles

A Kernel Approach to Multi-Task Learning with Task-Specific Kernels

Wei Wu1,2 (武威), Hang Li3 (李航), Member, CCF, ACM, IEEE, Yun-Hua Hu4 (胡云华), Member, ACM, and Rong Jin5 (金榕), Member, IEEE   

  1. 1. MOE-Microsoft Key Laboratory of Statistics and Information Technology, Department of Probability and Statistics School of Mathematical Sciences, Peking University, Beijing 100871, China;
    2. Beijing International Center for Mathematical Research, Peking University, Beijing 100871, China;
    3. Noah's Ark Lab, Huawei, Hong Kong Science Park, Hong Kong, China;
    4. Microsoft Research Asia, Beijing 100080, China;
    5. Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, U.S.A.
  • Received:2011-09-15 Revised:2012-04-27 Online:2012-11-05 Published:2012-11-05

Several kernel-based methods for multi-task learning have been proposed, which leverage relations among tasks as regularization to enhance the overall learning accuracies. These methods assume that the tasks share the same kernel, which could limit their applications because in practice different tasks may need different kernels. The main challenge of introducing multiple kernels into multiple tasks is that models from different reproducing kernel Hilbert spaces (RKHSs) are not comparable, making it difficult to exploit relations among tasks. This paper addresses the challenge by formalizing the problem in the square integrable space (SIS). Specially, it proposes a kernel-based method which makes use of a regularization term defined in SIS to represent task relations. We prove a new representer theorem for the proposed approach in SIS. We further derive a practical method for solving the learning problem and conduct consistency analysis of the method. We discuss the relationship between our method and an existing method. We also give an SVM (support vector machine)- based implementation of our method for multi-label classification. Experiments on an artificial example and two real-world datasets show that the proposed method performs better than the existing method.

[1] Caruana R. Multitask learning. Machine Learning, 1997,28(1): 41-75.

[2] Ben-David S, Schuller R. Exploiting task relatedness for mul-tiple task learning. In Lecture Notes in Computer Science,Carbonell J, Siekmann J (eds.), Springer, 2003, pp.567-580.

[3] Evgeniou T, Micchelli C, Pontil M. Learning multiple taskswith kernel methods. Journal of Machine Learning Research,2005, 6(April): 615-637.

[4] Kato T, Kashima H, Sugiyama M, Asai K. Multi-task learn-ing via conic programming. In Proc. the 22nd Conf. NeuralInformation Processing System, Dec. 2008, pp.737-744.

[5] Evgeniou T, Pontil M. Regularized multi-task learning. InProc. the 10th SIGKDD, Aug. 2004, pp.109-117.

[6] Micchelli C, Pontil M. Kernels for multi-task learning. InProc. the 19th NIPS, Dec. 2005, pp.921-928.

[7] Ando R K, Zhang T. A framework for learning predictivestructures from multiple tasks and unlabeled data. Journalof Machine Learning Research, 2005, 6(Nov.): 1817-1853.

[8] Argyriou A, Evgeniou T, Pontil M. Multi-task feature learn-ing. In Proc. NIPS, December 2007, pp.41-48.

[9] Schölkopf B, Smola A. Learning with Kernels: Support VectorMachines, Regularization, Optimization, and Beyond. Cam-bridge, Massachusetts, USA: MIT Press, 2002.

[10] Hofmann T, Schölkopf B, Smola A. Kernel methods in ma-chine learning. Annals of Statistics, 2008, 36(3): 1171-1220.

[11] Lanckriet G R, Cristianini N, Bartlett P, Ghaoui L, Jordan M.Learning the kernel matrix with semi-definite programming.In Proc. the 19th ICML, July 2002, pp.323-330.

[12] Bach F, Lanckriet G R, Jordan M. Multiple kernel learning,conic duality, and the SMO algorithm. In Proc. the 21stICML, July 2004, Article No. 6.

[13] Tang L, Chen J, Ye J. On multiple kernel learning with multi-ple labels. In Proc. the 21st IJCAI, July 2009, pp.1255-1260.

[14] Ji S, Sun L, Jin R, Ye J. Multi-label multiple kernel learning.In Proc. the 22nd NIPS, 2008, pp.777-784

[15] Duan L, Tsang I, Xu D, Chua T. Domain adaptation frommultiple sources via auxiliary classifiers. In Proc. the 26thICML, June 2009, pp.289-296.

[16] Aronszajn N. Theory of reproducting kernels. Transactionsof the American Mathematical Society, 1950, 68(3): 337-404.

[17] Cucker F, Smale S. On the mathematical foundations of learn-ing. Bulletin of the American Mathematical Society, 2002,39(1): 1-49.

[18] Renardy M, Rogers R. An Introduction to Partial DifferentialEquations. New York, USA: Springer-Verlag, 1993.

[19] Elisseeff A, Weston J. Kernel methods for multi-labelled clas-sification and categorical regression problems. In Proc. the16th NIPS, December 2002, pp.681-688.

[20] Lewis D. Evaluating text categorization. In Proc. the Work-shop on Speech and Natural Language, Feb. 1991, pp.312-318.

[21] Lanckriet G R, Deng M, Cristianini N, Jordan M, Noble W.Kernel-based data fusion and its application to protein func-tion prediction in yeast. In Proc. Pacific Symp. Biocomput-ing, January 2004, pp.300-311.

[22] Dieudonn? J. Foundations of Modern Analysis (2nd edition).New York, USA: Academic Press, 1969.

[23] Belkin M, Niyogi P, Sindhwani V. Manifold regularization:A geometric framework for learning from labeled and un-labeled examples. Journal of Machine Learning Research,2006, 7(Nov.): 2399-2434.

[24] Bartlett P, Mendelson S. Rademacher and Gaussian complex-ities: Risk bounds and structural results. Journal of MachineLearning Research, 2002, 3(Nov.): 463-482.
No related articles found!
Full text



[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved