We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Jin LB, Lei N, Luo ZX et al. Semi-discrete optimal transport for long-tailed classification. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 40(1): 252−266, Jan. 2025. DOI: 10.1007/s11390-023-3086-0
Citation: Jin LB, Lei N, Luo ZX et al. Semi-discrete optimal transport for long-tailed classification. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 40(1): 252−266, Jan. 2025. DOI: 10.1007/s11390-023-3086-0

Semi-Discrete Optimal Transport for Long-Tailed Classification

Funds: This work was partially supported by the National Key Research and Development Program of China under Grant No. 2021YFA1003003 and the National Natural Science Foundation of China under Grant Nos. 61936002 and T2225012.
More Information
  • Author Bio:

    Lian-Bao Jin received his B.Sc. degree in mathematics and applied mathematics from Jining University, Jining, in 2013. He received his M.Sc. degree in computational mathematics from Nanchang Hangkong University, Nanchang, in 2016. He is working toward his Ph.D. degree in computational mathematics at the School of Mathematical Sciences, Dalian University of Technology, Dalian. His research interests include computer vision and interpretable artificial intelligence

    Na Lei received her Ph.D. degree in computational mathematics from Jilin University, Changchun, in 2002. She is a professor at the International School of Information Science and Engineering, Dalian University of Technology, Dalian. Her research interests include interpretable artificial intelligence and structural hexahedral mesh generation

    Zhong-Xuan Luo received his Ph.D. degree in computational mathematics from Dalian University of Technology, Dalian, in 1991. He is a professor at the School of Software Technology, Dalian University of Technology, Dalian. His research interests include computational geometry, computer vision/graphics and image, and underwater (nimble) robert

    Jin Wu received his Ph.D. degree in automotive engineering technology/technician from South China University of Technology, Guangzhou, in 2006. He is currently a senior analyst at the Institute of Strategic Research of Huawei

    Chao Ai received his B.Sc. degree from Huazhong University of Science and Technology, Wuhan, in 1996. He received his M.Sc. degree from Sun Yat-Sen University, Guangzhou, in 1999. He is currently a vice president of Global Technical Cooperation at Huawei

    Xianfeng Gu received his Ph.D. degree in computer science from Harvard University, Cambridge, in 2003. He is a professor of computer science and the director of the 3D Scanning Laboratory with the Department of Computer Science, Stony Brook University, Stony Brook, New York. His research interests include computer vision, graphics, geometric modeling, and medical imaging. He won the US National Science Foundation CAREER Award in 2004

  • Corresponding author:

    nalei@dlut.edu.cn

  • Received Date: January 10, 2023
  • Accepted Date: September 24, 2023
  • The long-tailed data distribution poses an enormous challenge for training neural networks in classification. A classification network can be decoupled into a feature extractor and a classifier. This paper takes a semi-discrete optimal transport (OT) perspective to analyze the long-tailed classification problem, where the feature space is viewed as a continuous source domain, and the classifier weights are viewed as a discrete target domain. The classifier is indeed to find a cell decomposition of the feature space with each cell corresponding to one class. An imbalanced training set causes the more frequent classes to have larger volume cells, which means that the classifier’s decision boundary is biased towards less frequent classes, resulting in reduced classification performance in the inference phase. Therefore, we propose a novel OT-dynamic softmax loss, which dynamically adjusts the decision boundary in the training phase to avoid overfitting in the tail classes. In addition, our method incorporates the supervised contrastive loss so that the feature space can satisfy the uniform distribution condition. Extensive and comprehensive experiments demonstrate that our method achieves state-of-the-art performance on multiple long-tailed recognition benchmarks, including CIFAR-LT, ImageNet-LT, iNaturalist 2018, and Places-LT.

  • [1]
    He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/CVPR.2016.90.
    [2]
    Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84–90. DOI: 10.1145/3065386.
    [3]
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S A, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252. DOI: 10.1007/s11263-015-0816-y.
    [4]
    Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft COCO: Common objects in context. In Proc. the 13th European Conference on Computer Vision, Sept. 2014, pp.740–755. DOI: 10.1007/978-3-319-10602-1_48.
    [5]
    Buda M, Maki A, Mazurowski M A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 2018, 106: 249–259. DOI: 10.1016/j.neunet.2018.07.011.
    [6]
    He H, Garcia E A. Learning from imbalanced data. IEEE Trans. Knowledge and Data Engineering, 2020, 21(9): 1263–1284. DOI: 10.1109/TKDE.2008.239.
    [7]
    Wang Y X, Ramanan D, Hebert M. Learning to model the tail. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.7032–7042. DOI: 10.5555/3295222.3295446.
    [8]
    Japkowicz N, Stephen S. The class imbalance problem: A systematic study. Intelligent Data Analysis, 2002, 6(5): 429–449. DOI: 10.3233/IDA-2002-6504.
    [9]
    Shen L, Lin Z, Huang Q. Relay backpropagation for effective learning of deep convolutional neural networks. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.467–482. DOI: 10.1007/978-3-319-46478-7_29.
    [10]
    Zhang S, Li Z, Yan S, He X, Sun J. Distribution alignment: A unified framework for long-tail visual recognition. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.2361–2370. DOI: 10.1109/CVPR46437.2021.00239.
    [11]
    Menon A K, Jayasumana S, Rawat A S, Jain H, Veit A, Kumar S. Long-tail learning via logit adjustment. In Proc. the 9th International Conference on Learning Representations, May 2021.
    [12]
    Ren J, Yu C, Sheng S, Ma X, Zhao H, Yi S, Li H. Balanced meta-softmax for long-tailed visual recognition. In Proc. the 34th International Conference on Neural Information Processing Systems, Dec. 2020, Article No. 351. DOI: 10.5555/3495724.3496075.
    [13]
    Hong Y, Han S, Choi K, Seo S, Kim B, Chang B. Disentangling label distribution for long-tailed visual recognition. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.6622–6632. DOI: 10.1109/CVPR46437.2021.00656.
    [14]
    Cui J, Zhong Z, Liu S, Yu B, Jia J. Parametric contrastive learning. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.715–724. DOI: 10.1109/ICCV48922.2021.00075.
    [15]
    Kang B, Li Y, Xie S, Yuan Z, Feng J. Exploring balanced feature spaces for representation learning. In Proc. the 9th International Conference on Learning Representations, May 2021.
    [16]
    Samuel D, Chechik G. Distributional robustness loss for long-tail learning. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Jun. 2021, pp.9495–9504. DOI: 10.1109/ICCV48922.2021.00936.
    [17]
    Wang P, Han K, Wei X S, Zhang L, Wang L. Contrastive learning based hybrid networks for long-tailed image classification. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.943–952. DOI: 10.1109/CVPR46437.2021.00100.
    [18]
    Cui Y, Song Y, Sun C, Howard A, Belongie S. Large scale fine-grained categorization and domain-specific transfer learning. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4109–4118. DOI: 10.1109/CVPR.2018.00432.
    [19]
    Kang B, Xie S, Rohrbach M, Yan Z, Gordo A, Feng J, Kalantidis Y. Decoupling representation and classifier for long-tailed recognition. In Proc. the 8th International Conference on Learning Representations, Apr. 2020.
    [20]
    Tang K, Huang J, Zhang H. Long-tailed classification by keeping the good and removing the bad momentum causal effect. In Proc. the 34th International Neural Information Processing Systems, Dec. 2020, Article No. 128. DOI: 10.5555/3495724.3495852.
    [21]
    Peng H, Sun M, Li P. Optimal transport for long-tailed recognition with learnable cost matrix. In Proc. the 10th International Conference on Learning Representations, Apr. 2022.
    [22]
    Lei N, Su K, Cui L, Yau S T, Gu X D. A geometric view of optimal transportation and generative model. Computer Aided Geometric Design, 2019, 68: 1–21. DOI: 10.1016/j.cagd.2018.10.005.
    [23]
    An D, Guo Y, Lei N, Luo Z, Yau S T, Gu X. AE-OT: A new generative model based on extended semi-discrete optimal transport. In Proc. the 8th International Conference on Learning Representations, Apr. 2020.
    [24]
    Tai K S, Bailis P D, Valiant G. Sinkhorn label allocation: Semi-supervised classification via annealed self-training. In Proc. the 38th International Conference on Machine Learning, Jul. 2021, pp.10065–10075.
    [25]
    Ge Z, Liu S, Li Z, Yoshie O, Sun J. OTA: Optimal transport assignment for object detection. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.303–312. DOI: 10.1109/CVPR46437.2021.00037.
    [26]
    Mi L, Zhang W, Gu X, Wang Y. Variational Wasserstein clustering. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.322–337. DOI: 10.1007/978-3-030-01267-0_20.
    [27]
    Gu X, Luo F, Sun J, Yau S T. Variational principles for Minkowski type problems, discrete optimal transport, and discrete Monge-Ampère equations. Asian Journal of Mathematics, 2016, 20(2): 383–398. DOI: 10.4310/AJM.2016.v20.n2.a7.
    [28]
    Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Krishnan D. Supervised contrastive learning. In Proc. the 34th Conference on Neural Information Processing Systems, Dec. 2020, Article No. 1567. DOI: 10.5555/3495724.3497291.
    [29]
    Wang T, Isola P. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proc. the 37th International Conference on Machine Learning, Jul. 2020, Article No. 92.
    [30]
    Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Technical Report, CiteSeer, 2009.
    [31]
    Van Horn G, Mac Aodha O, Song Y, Cui Y, Sun C, Shepard A, Belongie S. The iNaturalist species classification and detection dataset. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.8769–8778. DOI: 10.1109/CVPR.2018.00914.
    [32]
    Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 2018, 40(6): 1452–1464. DOI: 10.1109/TPAMI.2017.2723009.
    [33]
    Byrd J, Lipton Z. What is the effect of importance weighting in deep learning? In Proc. the 36th International Conference on Machine Learning, Jun. 2019, pp.872–881.
    [34]
    Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L. Temporal segment networks: Towards good practices for deep action recognition. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp: 20–36. DOI: 10.1007/978-3-319-46484-8_2.
    [35]
    Zhao Y, Chen W, Tan X, Huang K, Zhu J. Adaptive logit adjustment loss for long-tailed visual recognition. In Proc. the 36th AAAI Conference on Artificial Intelligence, Feb. 22–Mar. 1, 2022: 3472–3480. DOI: 10.1609/aaai.v36i3.20258.
    [36]
    Lin T Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2999–3007. DOI: 10.1109/ICCV.2017.324.
    [37]
    Zhou B, Cui Q, Wei X S, Chen Z M. BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.9719–9728. DOI: 10.1109/CVPR42600.2020.00974.
    [38]
    Wang X, Lian L, Miao Z, Liu Z, Yu S. Long-tailed recognition by routing diverse distribution-aware experts. In Proc. the 9th International Conference on Learning Representations, May 2021.
    [39]
    Xiang L, Ding G, Han J. Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.247–263. DOI: 10.1007/978-3-030-58558-7_15.
    [40]
    Wang Y, Zhang B, Hou W, Wu Z, Wang J, Shinozaki T. Margin calibration for long-tailed visual recognition. In Proc. the 14th Asian Conference on Machine Learning, Dec. 2023, pp.1101–1116.
    [41]
    Zhang Y, Wei X S, Zhou B, Wu J. Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. In Proc. the 35th AAAI Conference on Artificial Intelligence, Feb. 2021: 3447–3455. DOI: 10.1609/aaai.v35i4.16458.
    [42]
    Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In Proc. the 37th International Conference on Machine Learning, Jul. 2020, Article No. 149. DOI: 10.5555/3524938.3525087.
    [43]
    He K, Fan H, Wu Y, Xie S, Girshick R. Momentum contrast for unsupervised visual representation learning. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.9726–9735. DOI: 10.1109/CVPR42600.2020.00975.
    [44]
    Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A C. Improved training of Wasserstein GANs. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.5769–5779. DOI: 10.5555/3295222.3295327.
    [45]
    Salimans T, Metaxas D, Zhang H, Radford A. Improving GANs using optimal transport. In Proc. the 6th International Conference on Learning Representations, Apr. 30–May 3, 2018.
    [46]
    Deshpande I, Hu Y T, Sun R, Pyrros A, Siddiqui N, Koyejo S, Schwing A G. Max-sliced Wasserstein distance and its use for GANs. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.10648–10656. DOI: 10.1109/CVPR.2019.01090.
    [47]
    Courty N, Flamary R, Habrard A, Rakotomamonjy A. Joint distribution optimal transportation for domain adaptation. In Proc. the 31st Neural Information Processing Systems, Dec. 2017, pp.3733–3742.
    [48]
    Yan Y, Li W, Wu H, Min H, Tan M, Wu Q. Semi-supervised optimal transport for heterogeneous domain adaptation. In Proc. the 27th International Joint Conference on Artificial Intelligence, Jul. 2018, pp.2969–2975. DOI: 10.5555/3304889.3305073.
    [49]
    Su Z, Wang Y, Shi R, Zeng W, Sun J, Luo F, Gu X. Optimal mass transport for shape matching and comparison. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015, 37(11): 2246–2259. DOI: 10.1109/TPAMI.2015.2408346.
    [50]
    Xu H, Luo D, Carin L. Scalable Gromov-Wasserstein learning for graph partitioning and matching. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, Article No. 274. DOI: 10.5555/3454287.3454561.
    [51]
    Kandasamy K, Neiswanger W, Schneider J, Póczos B, Xing E P. Neural architecture search with Bayesian optimisation and optimal transport. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.2020–2029. DOI: 10.5555/3326943.3327130.
    [52]
    Rachev S T, Rüschendorf L. Mass Transportation Problems: Volume I: Theory. Springer, 1998. DOI: 10.1007/b98893.
    [53]
    Brenier Y. Polar decomposition and increasing rearrangement of vector fields. Comptes Rendus de Lacademic Des Sciences Serie I-Mathematique, 1987, 305(19): 805–808.
    [54]
    Cao K, Wei C, Gaidon A, Arechiga N, Ma T. Learning imbalanced datasets with label-distribution-aware margin loss. In Proc. the 33rd Conference on Neural Information Processing Systems, Dec. 2019, Article No. 140. DOI: 10.5555/3454287.3454427.
    [55]
    Wei X S, Wang P, Liu L, Shen C, Wu J. Piecewise classifier mappings: Learning fine-grained learners for novel categories with few examples. IEEE Trans. Image Processing, 2019, 28(12): 6116–6125. DOI: 10.1109/TIP.2019.2924811.
    [56]
    Liu Z, Miao Z, Zhan X, Wang J, Gong B, Yu S X. Large-scale long-tailed recognition in an open world. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.2532–2541. DOI: 10.1109/CVPR.2019.00264.
  • Others

Catalog

    Article views (188) PDF downloads (89) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return