We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Feng Q, Yao JY, Xie MK et al. Sequential cooperative distillation for imbalanced multi-task learning. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(5): 1094−1106 Sept. 2024. DOI: 10.1007/s11390-024-2264-z.
Citation: Feng Q, Yao JY, Xie MK et al. Sequential cooperative distillation for imbalanced multi-task learning. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(5): 1094−1106 Sept. 2024. DOI: 10.1007/s11390-024-2264-z.

Sequential Cooperative Distillation for Imbalanced Multi-Task Learning

Funds: The work was financially supported by the National Science and Technology Major Project of China under Grant No. J2019-IV-0018-0086 and the National Natural Science Foundation of China under Grant No. 62076124.
More Information
  • Author Bio:

    Quan Feng received his B.S. degree in information and computing science from Nanchang Hangkong University, Nanchang, in 2008. In 2011, he completed his M.Sc. degree in computer science and technique at the same university. He is a Ph.D candidate at the MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing. His research interests are mainly in machine learning, especially multi-task learning

    Jia-Yu Yao is currently a master student at the MIIT Key Laboratory of Pattern Analysis and Machine Intelligence at Nanjing University of Aeronautics and Astronautics, Nanjing. His main research interests include representation learning and self-supervised learning

    Ming-Kun Xie received his B.Sc degree in 2018. He is currently a master student at the MIIT Key Laboratory of Pattern Analysis and Machine Intelligence at Nanjing University of Aeronautics and Astronautics, Nanjing. His research interests are mainly in machine learning, particularly, multi-label learning and weakly-supervised learning

    Sheng-Jun Huang received his B.Sc and Ph.D degrees in computer science from Nanjing University, Nanjing, in 2008 and 2014, respectively. He is now a professor at the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing. His main research interests include machine learning and data mining

    Song-Can Chen received his B.S. degree in mathematics from Hangzhou University (now merged into Zhejiang University), Hangzhou, in 1983. In 1985, he completed his M.S. degree in computer applications at Shanghai Jiaotong University, Shanghai, and then worked at Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, in January 1986, where he received his Ph.D. degree in communication and information systems in 1997. Since 1998, as a full-time professor, he has been with the College of Computer Science and Technology at NUAA, Nanjing. His research interests include pattern recognition, machine learning and neural computing

  • Corresponding author:

    s.chen@nuaa.edu.cn

  • † Co-First Author (Quan Feng wrote the methodological implications of the article, Jia-Yu Yao wrote the related work, Ming-Kun Xie wrote the learning algorithm framework section. The above several have made equal contributions to the paper.) Sheng-Jun Huang revised the introduction section.

  • Received Date: February 23, 2022
  • Accepted Date: April 07, 2024
  • Multi-task learning (MTL) can boost the performance of individual tasks by mutual learning among multiple related tasks. However, when these tasks assume diverse complexities, their corresponding losses involved in the MTL objective inevitably compete with each other and ultimately make the learning biased towards simple tasks rather than complex ones. To address this imbalanced learning problem, we propose a novel MTL method that can equip multiple existing deep MTL model architectures with a sequential cooperative distillation (SCD) module. Specifically, we first introduce an efficient mechanism to measure the similarity between tasks, and group similar tasks into the same block to allow their cooperative learning from each other. Based on this, the grouped task blocks are sorted in a queue to determine the learning sequence of the tasks according to their complexities estimated with the defined performance indicator. Finally, a distillation between the individual task-specific models and the MTL model is performed block by block from complex to simple manner, achieving a balance between competition and cooperation among learning multiple tasks. Extensive experiments demonstrate that our method is significantly more competitive compared with state-of-the-art methods, ranking No.1 with average performances across multiple datasets by improving 12.95% and 3.72% compared with OMTL and MTLKD, respectively.

  • [1]
    Sun G, Probst T, Paudel D P, Popović N, Kanakis M, Patel J, Dai D, Van Gool L. Task switching network for multi-task learning. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.8271–8280. DOI: 10.1109/ICCV48922.2021.00818.
    [2]
    Brüggemann D, Kanakis M, Obukhov A, Georgoulis S, Van Gool L. Exploring relational context for multi-task dense prediction. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.15849–15858. DOI: 10.1109/ICCV48922.2021.01557.
    [3]
    Qing L, Li L, Xu S, Huang Y, Liu M, Jin R, Liu B, Niu T, Wen H, Wang Y, Jiang X, Peng Y. Public life in public space (PLPS): A multi-task, multi-group video dataset for public life research. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision Workshops, Oct. 2021, pp.3611–3620. DOI: 10.1109/ICCVW54120.2021.00404.
    [4]
    Kumar V R, Yogamani S, Rashed H, Sitsu G, Witt C, Leang I, Milz S, Mäder P. OmniDet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters, 2021, 6(2): 2830–2837. DOI: 10.1109/LRA.2021.3062324.
    [5]
    Tseng K K, Lin J, Chen C M, Hassan M M. A fast instance segmentation with one-stage multi-task deep neural network for autonomous driving. Computers & Electrical Engineering, 2021, 93: 107194. DOI: 10.1016/j.compeleceng.2021.107194.
    [6]
    Zhou M, Zhou L, Wang S, Cheng Y, Li L, Yu Z, Liu J. UC2: Universal cross-lingual cross-modal vision-and-language pre-training. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.4153–4163. DOI: 10.1109/CVPR46437.2021.00414.
    [7]
    Domingo O, Costa-Jussà M R, Escolano C. A multi-task semi-supervised framework for Text2Graph & Graph2Text. arXiv: 2202.06041, 2022. https://arxiv.org/abs/2202.06041, Sept. 2024.
    [8]
    Saon G, Tüske Z, Bolanos D, Kingsbury B. Advancing RNN transducer technology for speech recognition. In Proc. the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Jun. 2021, pp.5654–5658. DOI: 10.1109/ICASSP39728.2021.9414716.
    [9]
    Tang Y, Pino J, Wang C, Ma X, Genzel D. A general multi-task learning framework to leverage text data for speech to text tasks. In Proc. the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Jun. 2021, pp.6209–6213. DOI: 10.1109/ICASSP39728.2021.9415058.
    [10]
    Kalashnikov D, Varley J, Chebotar Y, Swanson B, Jonschkowski R, Finn C, Levine S, Hausman K. MT-Opt: Continuous multi-task robotic reinforcement learning at scale. arXiv: 2104.08212, 2021. https://arxiv.org/abs/2104.08212, Sept. 2024.
    [11]
    Liu C, Li X, Li Q, Xue Y, Liu H, Gao Y. Robot recognizing humans intention and interacting with humans based on a multi-task model combining ST-GCN-LSTM model and YOLO model. Neurocomputing, 2021, 430: 174–184. DOI: 10.1016/j.neucom.2020.10.016.
    [12]
    Lu Y, Kumar A, Zhai S, Cheng Y, Javidi T, Feris R. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.1131–1140. DOI: 10.1109/CVPR.2017.126.
    [13]
    Long M, Cao Z, Wang J, Yu P S. Learning multiple tasks with multilinear relationship networks. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.1593–1602.
    [14]
    Cipolla R, Gal Y, Kendall A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.7482–7491. DOI: 10.1109/CVPR.2018.00781.
    [15]
    Guo M, Haque A, Huang D A, Yeung S, Fei-Fei L. Dynamic task prioritization for multitask learning. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.282–299. DOI: 10.1007/978-3-030-01270-0_17.
    [16]
    Li S Y, Huang S J, Chen S. Crowdsourcing aggregation with deep Bayesian learning. Science China Information Sciences, 2021, 64(3): 130104. DOI: 10.1007/s11432-020-3118-7.
    [17]
    Sener O, Koltun V. Multi-task learning as multi-objective optimization. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.525–536.
    [18]
    Lin X, Zhen H L, Li Z, Zhang Q, Kwong S. Pareto multi-task learning. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, Article No. 1080.
    [19]
    Ranjan R, Patel V M, Chellappa R. HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 2019, 41(1): 121–135. DOI: 10.1109/TPAMI.2017.2781233.
    [20]
    Liu S, Johns E, Davison A J. End-to-end multi-task learning with attention. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.1871–1880. DOI: 10.1109/CVPR.2019.00197.
    [21]
    Raychaudhuri D S, Suh Y, Schulter S, Yu X, Faraki M, Roy-Chowdhury A K, Chandraker M. Controllable dynamic multi-task architectures. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.10945–10954. DOI: 10.1109/CVPR52688.2022.01068.
    [22]
    Caruana R A. Multitask learning: A knowledge-based source of inductive bias. In Proc. the 10th International Conference on International Conference on Machine Learning, Jun. 1993, pp.41–48. DOI: 10.1016/B978-1-55860-307-3.50012-5.
    [23]
    Baxter J. A model of inductive bias learning. Journal of Artificial Intelligence Research, 2000, 12: 149–198. DOI: 10.1613/jair.731.
    [24]
    Chen Z, Badrinarayanan V, Lee C Y, Rabinovich A. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In Proc. the 35th International Conference on Machine Learning, Jul. 2018, pp.793–802.
    [25]
    Yu T, Kumar S, Gupta A, Levine S, Hausman K, Finn C. Gradient surgery for multi-task learning. In Proc. the 34th International Conference on Neural Information Processing Systems, Dec. 2020, Article No. 489.
    [26]
    Chung I, Park S, Kim J, Kwak N. Feature-map-level online adversarial knowledge distillation. In Proc. the 37th International Conference on Machine Learning, Jul. 2020, Article No. 187.
    [27]
    Mirzadeh S I, Farajtabar M, Li A, Levine N, Matsukawa A, Ghasemzadeh H. Improved knowledge distillation via teacher assistant. In Proc. the 34th AAAI Conference on Artificial Intelligence, Apr. 2020, pp.5191–5198. DOI: 10.1609/aaai.v34i04.5963.
    [28]
    Li W H, Bilen H. Knowledge distillation for multi-task learning. In Proc. the European Conference on Computer Vision, Aug. 2020, pp.163–176. DOI: 10.1007/978-3-030-65414-6_13.
    [29]
    Masana M, Liu X, Twardowski B, Menta M, Bagdanov A D, Van De Weijer J. Class-incremental learning: Survey and performance evaluation on image classification. IEEE Trans. Pattern Analysis and Machine Intelligence, 2023, 45(5): 5513–5533. DOI: 10.1109/TPAMI.2022.3213473.
    [30]
    De Lange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis A, Slabaugh G, Tuytelaars T. A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2022, 44(7): 3366–3385. DOI: 10.1109/TPAMI.2021.3057446.
    [31]
    Gou J, Yu B, Maybank S J, Tao D. Knowledge distillation: A survey. International Journal of Computer Vision, 2021, 129(6): 1789–1819. DOI: 10.1007/s11263-021-01453-z.
    [32]
    Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L. Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Analysis and Machine Intelligence, 2022, 44(7): 3614–3633. DOI: 10.1109/TPAMI.2021.3054719.
    [33]
    Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2261–2269. DOI: 10.1109/CVPR.2017.243.
    [34]
    Shui C, Abbasi M, Robitaille L É, Wang B, Gagné C. A principled approach for learning task similarity in multitask learning. In Proc. the 28th International Joint Conference on Artificial Intelligence, Aug. 2019, pp.3446–3452. DOI: 10.24963/ijcai.2019/478.
    [35]
    He Y, Liu P, Zhu L, Yang Y. Filter pruning by switching to neighboring CNNs with good attributes. IEEE Trans. Neural Networks and Learning Systems, 2023, 34(10): 8044–8056. DOI: 10.1109/TNNLS.2022.3149332.
    [36]
    Feng Q, Yao J, Zhong Y, Li P, Pan Z. Learning twofold heterogeneous multi-task by sharing similar convolution kernel pairs. Knowledge-Based Systems, 2022, 252: 109396. DOI: 10.1016/j.knosys.2022.109396.
    [37]
    Bellemare M G, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R. Unifying count-based exploration and intrinsic motivation. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.1479–1487.
    [38]
    Graves A, Bellemare M G, Menick J, Munos R, Kavukcuoglu K. Automated curriculum learning for neural networks. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.1311–1320.
    [39]
    David Eigen, Rob Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. the IEEE International Conference on Computer Vision, Dec. 2015, pp.2650–2658. DOI: 10.1109/ICCV.2015.304.
    [40]
    Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 2016, 17(1): 1334–1373. DOI: 10.5555/2946645.2946684.
    [41]
    Gao Y, Ma J, Zhao M, Liu W, Yuille A L. NDDR-CNN: Layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.3200–3209. DOI: 10.1109/CVPR.2019.00332.
    [42]
    Lee S, Son Y. Multitask learning with single gradient step update for task balancing. Neurocomputing, 2022, 467: 442–453. DOI: 10.1016/j.neucom.2021.10.025.
    [43]
    He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/CVPR.2016.90.
    [44]
    Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.2650–2658. DOI: 10.1109/ICCV.2015.304.

Catalog

    Article views (160) PDF downloads (28) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return