SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Bai GR, Liu QB, He SZ et al. Unsupervised domain adaptation on sentence matching through self-supervision. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(6): 1237−1249 Nov. 2023. DOI: 10.1007/s11390-022-1479-0. |
Although neural approaches have yielded state-of-the-art results in the sentence matching task, their performance inevitably drops dramatically when applied to unseen domains. To tackle this cross-domain challenge, we address unsupervised domain adaptation on sentence matching, in which the goal is to have good performance on a target domain with only unlabeled target domain data as well as labeled source domain data. Specifically, we propose to perform self-supervised tasks to achieve it. Different from previous unsupervised domain adaptation methods, self-supervision can not only flexibly suit the characteristics of sentence matching with a special design, but also be much easier to optimize. When training, each self-supervised task is performed on both domains simultaneously in an easy-to-hard curriculum, which gradually brings the two domains closer together along the direction relevant to the task. As a result, the classifier trained on the source domain is able to generalize to the unlabeled target domain. In total, we present three types of self-supervised tasks and the results demonstrate their superiority. In addition, we further study the performance of different usages of self-supervised tasks, which would inspire how to effectively utilize self-supervision for cross-domain scenarios.
[1] |
Bowman S R, Angeli G, Potts C, Manning C D. A large annotated corpus for learning natural language inference. arXiv: 1508.05326, 2015. https://arxiv.org/abs/1508.05326, Nov. 2023.
|
[2] |
Williams A, Nangia N, Bowman S R. A broad-coverage challenge corpus for sentence understanding through inference. arXiv: 1704.05426, 2017. https://arxiv.org/abs/1704.05426, Nov. 2023.
|
[3] |
Rus V, Banjade R, Lintean M. On paraphrase identification corpora. In Proc. the 9th International Conference on Language Resources and Evaluation, May 2014, pp.2422–2429.
|
[4] |
Dzikovska M, Nielsen R, Brew C, Leacock C, Giampiccolo D, Bentivogli L, Clark P, Dagan I, Dang H T. SemEval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge. In Proc. the 2nd Joint Conference on Lexical and Computational Semantics, Jun. 2013, pp.263–274.
|
[5] |
Nakov P, Hoogeveen D, Màrquez L, Moschitti A, Mubarak H, Baldwin T, Verspoor K. SemEval-2017 task 3: Community question answering. arXiv: 1912.00730, 2019. https://arxiv.org/abs/1912.00730, Nov. 2023.
|
[6] |
Wang M Q, Smith N A, Mitamura T. What is the jeopardy model? A quasi-synchronous grammar for QA. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jun. 2007, pp.22–32.
|
[7] |
Yang Y, Yih W T, Meek C. WikiQA: A challenge dataset for open-domain question answering. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, Sept. 2015, pp.2013–2018. DOI: 10.18653/v1/D15-1237.
|
[8] |
Bao X Q, Wu Y F. A tensor neural network with layerwise pretraining: Towards effective answer retrieval. Journal of Computer Science and Technology , 2016, 31(6): 1151–1160. DOI: 10.1007/s11390-016-1689-4.
|
[9] |
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations from natural language inference data. arXiv: 1705.02364, 2017. https://arxiv.org/abs/1705.02364, Nov. 2023.
|
[10] |
Choi J, Yoo K M, Lee S. Learning to compose task-specific tree structures. arXiv: 1707.02786, 2017. https://arxiv.org/abs/1707.02786, Nov. 2023.
|
[11] |
Nie Y X, Bansal M. Shortcut-stacked sentence encoders for multi-domain inference. arXiv: 1708.02312, 2017. https://arxiv.org/abs/1708.02312, Nov. 2023.
|
[12] |
Shen T, Zhou T Y, Long G D, Jiang J, Wang S, Zhang C Q. Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling. arXiv: 1801.10296, 2018. https://arxiv.org/abs/1801.10296, Nov. 2023.
|
[13] |
Chen Q, Zhu X D, Ling Z H, Wei S, Jiang H, Inkpen D. Enhanced LSTM for natural language inference. arXiv: 1609.06038, 2016. https://arxiv.org/abs/1609.06038, Nov. 2023.
|
[14] |
Yang L, Ai Q Y, Guo J F, Croft W B. aNMM: Ranking short answer texts with attention-based neural matching model. In Proc. the 25th ACM International on Conference on Information and Knowledge Management, Oct. 2016, pp.287–296. DOI: 10.1145/2983323.2983818.
|
[15] |
Wang Z G, Hamza W, Florian R. Bilateral multi-perspective matching for natural language sentences. arXiv: 1702.03814, 2017. https://arxiv.org/abs/1702.03814, Nov. 2023.
|
[16] |
Gong Y C, Luo H, Zhang J. Natural language inference over interaction space. arXiv: 1709.04348, 2017. https://arxiv.org/abs/1709.04348, Nov. 2023.
|
[17] |
Liang D, Zhang F B, Zhang Q, Huang X J. Asynchronous deep interaction network for natural language inference. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Nov. 2019, pp.2692–2700. DOI: 10.18653/v1/D19-1271.
|
[18] |
Chen L, Zhao Y B, Lyu B E, Jin L S, Chen Z, Zhu S, Yu K. Neural graph matching networks for Chinese short text matching. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.6152–6158. DOI: 10.18653/v1/2020.acl-main.547.
|
[19] |
Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv: 1810.04805, 2018. https://arxiv.org/abs/1810.04805, Nov. 2023.
|
[20] |
Pan S J, Yang Q. A survey on transfer learning. IEEE Trans. Knowledge and Data Engineering , 2010, 22(10): 1345–1359. DOI: 10.1109/TKDE.2009.191.
|
[21] |
Saenko K, Kulis B, Fritz M, Darrell T. Adapting visual category models to new domains. In Proc. the 11th European Conference on Computer Vision, Sept. 2010, pp.213–226. DOI: 10.1007/978-3-642-15561-1_16.
|
[22] |
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V. Domain-adversarial training of neural networks. The Journal of Machine Learning Research , 2016, 17(1): 2096–2030. DOI: 10.1007/978-3-319-58347-1_10.
|
[23] |
Wang Y Y, Gu J M, Wang C, Chen S C, Xue H. Discrimination-aware domain adversarial neural network. Journal of Computer Science and Technology , 2020, 35(2): 259–267. DOI: 10.1007/s11390-020-9969-4.
|
[24] |
Arjovsky M, Bottou L. Towards principled methods for training generative adversarial networks. arXiv: 1701.04862, 2017. https://arxiv.org/abs/1701.04862, Nov. 2023.
|
[25] |
Raina R, Battle A, Lee H, Packer B, Ng A Y. Self-taught learning: Transfer learning from unlabeled data. In Proc. the 24th International Conference on Machine Learning, Jun. 2007, pp.759–766. DOI: 10.1145/1273496.1273592.
|
[26] |
Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Analysis and Machine Intelligence , 2013, 35(8): 1798–1828. DOI: 10.1109/TPAMI.2013.50.
|
[27] |
Bengio Y, Louradour J, Collobert R, Weston J. Curriculum learning. In Proc. the 26th Annual International Conference on Machine Learning, Jun. 2009, pp.41–48. DOI: 10.1145/1553374.1553380.
|
[28] |
Peng M L, Zhang Q, Jiang Y G, Huang X J. Cross-domain sentiment classification with target domain specific information. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, Jul. 2018, pp.2505–2513. DOI: 10.18653/v1/P18-1233.
|
[29] |
Ghosal D, Hazarika D, Roy A, Majumder N, Mihalcea R, Poria S. KinGDOM: Knowledge-guided DOMain adaptation for sentiment analysis. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.3198–3210. DOI: 10.18653/v1/2020.acl-main.292.
|
[30] |
Cao Y, Fang M, Yu B S, Zhou J T. Unsupervised domain adaptation on reading comprehension. In Proc. the 34th AAAI Conference on Artificial Intelligence, Feb. 2020, pp.7480–7487. DOI: 10.1609/aaai.v34i05.6245.
|
[31] |
Kamath A, Jia R B, Liang P. Selective question answering under domain shift. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.5684–5696. DOI: 10.18653/v1/2020.acl-main.503.
|
[32] |
Ding N, Long D K, Xu G W, Zhu M H, Xie P J, Wang X B, Zheng H T. Coupling distant annotation and adversarial training for cross-domain Chinese word segmentation. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.6662–6671. DOI: 10.18653/v1/2020.acl-main.595.
|
[33] |
Rücklé A, Pfeiffer J, Gurevych I. MultiCQA: Zero-shot transfer of self-supervised text matching models on a massive scale. In Proc. the 2020 Conference on Empirical Methods in Natural Language Processing, Nov. 2020, pp.2471–2486. DOI: 10.18653/v1/2020.emnlp-main.194.
|
[34] |
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv: 1301.3781, 2013. https://arxiv.org/abs/1301.3781, Nov. 2023.
|
[35] |
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.3111–3119.
|
[36] |
Bengio Y, Ducharme R, Vincent P, Janvin C. A neural probabilistic language model. The Journal of Machine Learning Research , 2003, 3: 1137–1155. DOI: 10.1007/3-540-33486-6_6.
|
[37] |
Peters M E, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. arXiv: 1802.05365, 2018. https://arxiv.org/abs/1802.05365, Nov. 2023.
|
[38] |
Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training, 2018. https://www.bibsonomy.org/bibtex/15c343ed9a31ac52fd17a898f72af228f/lepsky?lang=en, Nov. 2023.
|
[39] |
Kumar M P, Packer B, Koller D. Self-paced learning for latent variable models. In Proc. the 23rd International Conference on Neural Information Processing Systems, Dec. 2010, pp.1189–1197.
|
[40] |
Sachan M, Xing E. Easy questions first? A case study on curriculum learning for question answering. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, Aug. 2016, pp.453–463. DOI: 10.18653/v1/P16-1043.
|
[41] |
Sachan M, Xing E. Self-training for jointly learning to ask and answer questions. In Proc. the 16th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2018, pp.629–640. DOI: 10.18653/v1/N18-1058.
|
[42] |
Tay Y, Wang S H, Tuan L A, Fu J, Phan M C, Yuan X D, Rao J F, Hui S C, Zhang A. Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives. arXiv: 1905.10847, 2019.https://arxiv.org/abs/1905.10847, Nov. 2023.
|
[43] |
Xu B F, Zhang L, Mao Z, Wang Q, Xie H, Zhang Y. Curriculum learning for natural language understanding. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.6095–6104. DOI: 10.18653/v1/2020.acl-main.542.
|
[44] |
Wu J W, Wang X, Wang W Y. Self-supervised dialogue learning. arXiv: 1907.00448, 2019. https://arxiv.org/abs/1907.00448, Nov. 2023.
|
[45] |
Lewis M, Liu Y H, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv: 1910.13461, 2019. https://arxiv.org/abs/1910.13461, Nov. 2023.
|
[46] |
Jurczyk T, Zhai M, Choi J D. SelQA: A new benchmark for selection-based question answering. In Proc. the 28th International Conference on Tools with Artificial Intelligence, Nov. 2016, pp.820–827. DOI: 10.1109/ICTAI.2016.0128.
|
[47] |
Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv: 1412.6980, 2014. https://arxiv.org/abs/1412.6980, Nov. 2023.
|
[48] |
Bousmalis K, Trigeorgis G, Silberman N, Krishnan D, Erhan D. Domain separation networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.343–351.
|
[49] |
Ziser Y, Reichart R. Task refinement learning for improved accuracy and stability of unsupervised domain adaptation. In Proc. the 57th Annual Meeting of the Association for Computational Linguistics, Jul. 2019, pp.5895–5906. DOI: 10.18653/v1/P19-1591.
|
[50] |
Long M S, Zhu H, Wang J M, Jordan M I. Deep transfer learning with joint adaptation networks. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.2208–2217.
|
[51] |
Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S. Central moment discrepancy (CMD) for domain-invariant representation learning. arXiv: 1702.08811, 2017. https://arxiv.org/abs/1702.08811, Dec. 2023.
|
[52] |
Ruder S, Plank B. Strong baselines for neural semi-supervised learning under domain shift. arXiv: 1804.09530, 2018. https://arxiv.org/abs/1804.09530, Nov. 2023.
|
[53] |
Ge Y X, Chen D P, Li H S. Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv: 2001.01526, 2020. https://arxiv.org/abs/2001.01526, Nov. 2023.
|
[1] | Ming-Cong Ma, Lu Wang, Yan-Ning Xu, Xiang-Xu Meng. Unsupervised Reconstruction for Gradient-Domain Rendering with Illumination Separation[J]. Journal of Computer Science and Technology, 2024, 39(6): 1281-1291. DOI: 10.1007/s11390-024-3142-4 |
[2] | Cai-Zi Li, Rui-Qiang Liu, Huan-Xin Zhong, Jun-Ming Fan, Wei-Xin Si, Meng Zhang, Pheng-Ann Heng. Semi-Supervised Intracranial Aneurysm Segmentation from CTA Images via Weight-Perceptual Self-Ensembling Model[J]. Journal of Computer Science and Technology, 2023, 38(3): 674-685. DOI: 10.1007/s11390-022-0870-1 |
[3] | Yuan-Zhen Li, Sheng-Jie Zheng, Zi-Xin Tan, Tuo Cao, Fei Luo, Chun-Xia Xiao. Self-Supervised Monocular Depth Estimation by Digging into Uncertainty Quantification[J]. Journal of Computer Science and Technology, 2023, 38(3): 510-525. DOI: 10.1007/s11390-023-3088-y |
[4] | Fan Liu, De-Long Chen, Rui-Zhi Zhou, Sai Yang, Feng Xu. Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion Generation[J]. Journal of Computer Science and Technology, 2022, 37(3): 539-558. DOI: 10.1007/s11390-022-2030-z |
[5] | Peng-Fei Sun, Ya-Wen Ouyang, Ding-Jie Song, Xin-Yu Dai. Self-Supervised Task Augmentation for Few-Shot Intent Detection[J]. Journal of Computer Science and Technology, 2022, 37(3): 527-538. DOI: 10.1007/s11390-022-2029-5 |
[6] | Peng-Fei Fang, Xian Li, Yang Yan, Shuai Zhang, Qi-Yue Kang, Xiao-Fei Li, Zhen-Zhong Lan. Connecting the Dots in Self-Supervised Learning: A Brief Survey for Beginners[J]. Journal of Computer Science and Technology, 2022, 37(3): 507-526. DOI: 10.1007/s11390-022-2158-x |
[7] | Yi-Min Wen, Shuai Liu. Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping[J]. Journal of Computer Science and Technology, 2020, 35(2): 295-304. DOI: 10.1007/s11390-020-9999-y |
[8] | Xing-Gang Wang, Jia-Si Wang, Peng Tang, Wen-Yu Liu. Weakly- and Semi-Supervised Fast Region-Based CNN for Object Detection[J]. Journal of Computer Science and Technology, 2019, 34(6): 1269-1278. DOI: 10.1007/s11390-019-1975-z |
[9] | Wei Chen, Jia-Hong Zhou, Jia-Xin Zhu, Guo-Quan Wu, Jun Wei. Semi-Supervised Learning Based Tag Recommendation for Docker Repositories[J]. Journal of Computer Science and Technology, 2019, 34(5): 957-971. DOI: 10.1007/s11390-019-1954-4 |
[10] | Juan-Juan Zhao, Ling Pan, Peng-Fei Zhao, Xiao-Xian Tang. Medical Sign Recognition of Lung Nodules based on Image Retrieval with Semantic Feature and Supervised Hashing[J]. Journal of Computer Science and Technology, 2017, 32(3): 457-469. DOI: 10.1007/s11390-017-1736-9 |