计算机科学技术学报 ›› 2022,Vol. 37 ›› Issue (3): 507-526.doi: 10.1007/s11390-022-2158-x

所属专题: 综述 Artificial Intelligence and Pattern Recognition

• • 上一篇    下一篇

连接自监督学习中的节点:给初学者的一篇简要综述

  

  • 收稿日期:2022-01-15 修回日期:2022-05-12 接受日期:2022-05-18 出版日期:2022-05-30 发布日期:2022-05-30

Connecting the Dots in Self-Supervised Learning: A Brief Survey for Beginners

Peng-Fei Fang1,2 (方鹏飞), Xian Li1 (李贤), Yang Yan1,3 (燕阳), Shuai Zhang1,3 (章帅), Qi-Yue Kang1 (康启越), Xiao-Fei Li1 (李晓飞), and Zhen-Zhong Lan1 (蓝振忠)        

  1. 1School of Engineering, WestLake University, Hangzhou 310030, China
    2College of Engineering and Computer Science, Australian National University, Canberra, ACT 2601, Australia
    3College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
  • Received:2022-01-15 Revised:2022-05-12 Accepted:2022-05-18 Online:2022-05-30 Published:2022-05-30
  • Contact: Peng-Fei Fang E-mail:fangpengfei@westlake.edu.cn
  • About author:Peng-Fei Fang received his B.E. degree in automation from Hangzhou Dianzi University (HDU), Hangzhou, in 2014, and his M.E. degree in mechatronics from Australian National University (ANU), Canberra, in 2017. He is currently pursuing his joint Ph.D. degree with ANU and the Data61-CSIRO. He is also a visiting scholar with Westlake University, Hangzhou. His research interests include computer vision and machine learning.
  • Supported by:
    This work was supported by the Key Research and Development Program of Zhejiang Province under Grant No. 2021C03139.

1、研究背景(context):自监督学习算法具有从大量未标记的数据中学习高质量的数据表征能力,近年来在人工智能社区以及人工智能以外的领域(例如蛋白质结构预测)都取得了巨大进展。随着自然语言处理和计算机视觉的不断发展,自监督学习领域的算法层出不穷。然而,初学者很难从大量出版物中清楚地了解自监督学习算法的进展情况。此外,现有的关于自监督学习算法的综述论文趋向于包含更多论文,以至于读者对自监督学习进行文献调研时,难以在不同领域间构建联系。
2、目的(Objective):本文从每个领域中选择了一些对自监督学习发展具有里程碑意义的论文和重要的论文。我们将这些论文描述为节点,并尝试在不同领域的节点论文之间建立联系。相比于简单的列出或分类这些论文,本文通过建立论文间的联系,试图解析自监督学习算法的演变过程,以及不同的领域论文是如何相互启发和进步的。
3、方法(Method):本文将各个领域中被引用次数较多的论文作为节点论文选择的标准。首先,本文选择每个领域中表征学习的热门工作。同时考虑到深度学习技术从2013年开始流行,因此本文仅考虑2013年以后发表的论文。
4、结果(Result & Findings):通过将自监督学习在各个领域的节点工作联系起来,本文使读者对自监督学习的发展产生全局的理解,并了解自监督学习在多个学科,即自然语言处理、计算机视觉、图学习、音频处理和蛋白质学习中,是如何相互影响,启发和发展的。最后,本文还讨论了自监督学习在未来的主要挑战和潜在的解决方案。
5、结论(Conclusions):本文在文本、图像、图等不同数据上,构建了自监督学习算法重要工作的发展路径。不仅揭示了自监督学习在不同学科间的进展路径,也清晰地展示了不同学科之间的相互影响和启发过程。例如,自然语言处理领域发明的Transformer架构启发了计算机视觉领域ViT的发展,计算机视觉领域的对比学习范式影响了图学习/音频学习等领域的发展。因此,学科之间不是隔绝和相互独立的,不同学科的发展会受到其它学科的启发,跨学科的研究是产生有影响力工作的有效途径。

关键词: 节点, 多学科, 自监督学习, 综述, 里程碑

Abstract:

The artificial intelligence (AI) community has recently made tremendous progress in developing self-supervised learning (SSL) algorithms that can learn high-quality data representations from massive amounts of unlabeled data. These methods brought great results even to the fields outside of AI. Due to the joint efforts of researchers in various areas, new SSL methods come out daily. However, such a sheer number of publications make it difficult for beginners to see clearly how the subject progresses. This survey bridges this gap by carefully selecting a small portion of papers that we believe are milestones or essential work. We see these researches as the "dots" of SSL and connect them through how they evolve. Hopefully, by viewing the connections of these dots, readers will have a high-level picture of the development of SSL across multiple disciplines including natural language processing, computer vision, graph learning, audio processing, and protein learning.

Key words: artificial intelligence (AI), dot, self-supervised learning (SSL), survey

[1] Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J. Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering. DOI: 10.1109/TKDE.2021.3090866.

[2] Han X, Zhang Z, Ding N et al. Pre-trained models: Past, present and future. AI Open, 2021, 2: 225-250. DOI: 10.1016/j.aiopen.2021.08.002.

[3] Rogers A, Kovaleva O, Rumshisky A. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 2020, 8: 842-866. DOI: 10.1162/tacla.

[4] Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2019, pp.4171-4186. DOI: 10.18653/v1/N19-1423.

[5] Chen M, Radford A, Child R, Wu J, Jun H, Luan D, Sutskever I. Generative pretraining from pixels. In Proc. the 37th International Conference on Machine Learning, July 2020, pp.1691-1703.

[6] He K, Chen X, Xie S, Li Y, Dollár P, Girshick R. Masked autoencoders are scalable vision learners. arXiv:2111.06377v3, 2021. https://arxiv.org/abs/2111.06377, December 2021.

[7] Hsu W N, Bolte B, Tsai Y H H, Lakhotia K, Salakhutdinov R, Mohamed A. HuBERT: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3451-3460. DOI: 10.1109/TASLP.2021.3122291.

[8] Doersch C, Gupta A, Efros A A. Unsupervised visual representation learning by context prediction. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1422-1430. DOI: 10.1109/ICCV.2015.167.

[9] Sadhu S, He D, Huang C W, Mallidi S H, Wu M, Rastrow A, Stolcke A, Droppo J, Maas R. wav2vec-C: A self-supervised model for speech representation learning. In Proc. the 22nd Annual Conference of the International Speech Communication Association, August 30-September 3, 2021, pp.711-715. DOI: 10.21437/Interspeech.2021-717.

[10] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:1301.3781v3, 2013. https://arxiv.org/abs/1301.3781, December 2021.

[11] Liu A T, Yang S W, Chi P H, Hsu P C, Lee H Y. Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders. In Proc. the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2020. DOI: 10.1109/ICASSP40776.2020.9054458.

[12] Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 2020, 63(10): 1872-1897. DOI: 10.1007/s11431-020-1647-3.

[13] Harris Z S. Distributional structure. Word, 1954, 10(2/3): 146-162. DOI: 10.1080/00437956.1954.11659520.

[14] Rajaraman A, David Ullman. Mining of Massive Datasets. Cambridge University Press, 2011.

[15] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12: 2493-2537.

[16] Dai A M, Le Q V. Semi-supervised sequence learning. In Proc. the 28th International Conference on Neural Information Processing Systems, December 2015, pp.3079-3087.

[17] Peters M E, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2018, pp.2227-2237. DOI: 10.18653/v1/N18-1202.

[18] Howard J, Ruder S. Universal language model fine-tuning for text classification. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.328-339. DOI: 10.18653/v1/P18-1031.

[19] Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692, 2019. https://arxiv.org/abs/1907.11692, December 2021.

[20] Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R R, Le Q V. XLNet: Generalized autoregressive pretraining for language understanding. In Proc. the 33rd International Conference on Neural Information Processing Systems, December 2019, pp.5754-5764.

[21] Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. ALBERT: A lite BERT for self-supervised learning of language representations. In Proc. the 8th International Conference on Learning Representations, April 2020.

[22] Clark K, Luong M T, Le Q V, Manning C D. ELECTRa: Pre-training text encoders as discriminators rather than generators. In Proc. the 8th International Conference on Learning Representations, April 2020.

[23] He P, Liu X, Gao J, Chen W. DeBERTa: Decoding-enhanced BERT with disentangled attention. In Proc. the 9th International Conference on Learning Representations, May 2021.

[24] Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 2020, 21: Article No. 140.

[25] Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. Technical Report, OpenAI, 2018. https://cdn.openai. com/research-covers/language-unsupervised/language_understandingpaper.pdf, December 2021.

[26] Brown T, Mann B, Ryder N et al. Language models are few-shot learners. In Proc. the Annual Conference on Neural Information Processing Systems, December 2020, pp.1877-1901.

[27] Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110. DOI: 10.1023/B:VISI.0000029664.99615.94.

[28] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In Proc. the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2005, pp.886-893. DOI: 10.1109/CVPR.2005.177.

[29] Bay H, Ess A, Tuytelaars T, Gool L V. Speeded-up robust features (SURF). Computer Vision and Image Understanding, 2008, 110(3): 346-359. DOI: 10.1016/j.cviu.2007.09.014.

[30] Dosovitskiy A, Fischer P, Sringenberg J T, Riedmiller M, Box T. Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(9): 1734-1747. DOI: 10.1109/TPAMI.2015.2496141.

[31] Noroozi M, Pirsiavash H, Favaro P. Representation learning by learning to count. In Proc. the International Conference on Computer Vision, October 2017, pp.5898-5906. DOI: 10.1109/ICCV.2017.628.

[32] Noroozi M, Favaro P. Unsupervised learning of visual representations by solving jigsaw puzzles. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.69-84. DOI: 10.1007/978-3-319-46466-4.

[33] Wang X, Gupta A. Unsupervised learning of visual representations using videos. In Proc. the IEEE International Conference on Computer Vision, December 2015, pp.2794-2802. DOI: 10.1109/ICCV.2015.320.

[34] Zhou T, Brown M, Snavely N, Lowe D G. Unsupervised learning of depth and ego-motion from video. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1851-1860. DOI: 10.1109/CVPR.2017.700.

[35] Hjelm R D, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y. Learning deep representations by mutual information estimation and maximization. In Proc. the 7th International Conference on Learning Representations, May 2019.

[36] Bachman P, Hjelm R D, Buchwalter W. Learning representations by maximizing mutual information across views. arXiv:1906.00910, 2019. https://arxiv.org/abs/1906.00910, December 2021.

[37] Tian Y, Krishnan D, Isola P. Contrastive Multiview coding. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.776-794. DOI: 10.1007/978-3-030-58621-8.

[38] He K, Fan H, Wu Y, Xie S, Girshick R. Momentum contrast for unsupervised visual representation learning. In Proc. the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.9726-9735. DOI: 10.1109/CVPR42600.2020.00975.

[39] Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In Proc. the 37th International Conference on Machine Learning, July 2020, pp.1597-1607.

[40] Chen X, Fan H, Girshick R, He K. Improved baselines with momentum contrastive learning. arXiv:2003.04297, 2020. https://arxiv.org/pdf/2003.04297.pdf, December 2021.

[41] Chen X, He K. Exploring simple Siamese representation learning. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.15750-15758. DOI: 10.1109/CVPR46437.2021.01549.

[42] Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A. Emerging properties in self-supervised vision transformers. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, October 2021, pp.9650-9660. DOI: 10.1109/ICCV48922.2021.00951.

[43] Niizumi D, Takeuchi D, Ohishi Y, Harada N, Kashino K. BYOL for audio: Self-supervised learning for general-purpose audio representation. arXiv:2103.06695, 2021. https://arxiv.org/abs/2103.06695, December 2021.

[44] Bao H, Dong L, Wei F. BEiT: BERT pre-training of image transformers. arXiv:2106.08254, 2021. https://arxiv.org/abs/2106.08254, December 2021.

[45] Wei C, Fan H, Xie S, Wu C Y, Yuille A, Feichtenhofer C. Masked feature prediction for self-supervised visual pre-training. arXiv:2112.09133v1, 2021. https://arxiv.org/abs/2112.09133, December 2021.

[46] Perozzi B, Al-Rfou R, Skiena S. DeepWalk: Online learning of social representations. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014, pp.701-710. DOI: 10.1145/2623330.2623732.

[47] Grover A. node2vec: Scalable feature learning for networks. In Proc. the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data, August 2016, pp.855-864. DOI: 10.1145/2939672.2939754.

[48] Veličković P, Fedus W, Hamilton W L, Liò P, Bengio Y, Hjelm R D. Deep graph infomax. In Proc. the 7th International Conference on Learning Representations, May 2019.

[49] Sun F Y, Hoffman J, Verma V, Tang J. InfoGraph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In Proc. the 8th International Conference on Learning Representations, April 2020.

[50] Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L. Deep graph contrastive representation learning. arXiv:2006.04131, 2020. https://arxiv.org/abs/2006.04131v1, December 2021.

[51] Hassani K, Khasahmadi A H. Contrastive multi-view representation learning on graphs. In Proc. the 37th International Conference on Machine Learning, July 2020, pp.4116-4126.

[52] Rong Y, Bian Y, Xu T, Xie W, Wei Y, Huang W, Huang J. Self-supervised graph transformer on large-scale molecular data. In Proc. the Annual Conference on Neural Information Processing Systems, December 2020.

[53] Wang H, Wang J, Wang J, Zhao M, Zhang W, Zhang F, Xie X, Guo M. GraphGAN: Graph representation learning with generative adversarial nets. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.2508-2515.

[54] Hu Z, Dong Y, Wang K, Chang K W, Sun Y. GPT-GNN: Generative pre-training of graph neural networks. In Proc. the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2020, pp.1857-1867. DOI: 10.1145/3394486.3403237.

[55] Zhang B, Leitner J, Thornton S. Audio recognition using Mel spectrograms and convolution neural networks. Technical Report, Dept. Electrical and Computer Engineering, University of California. http://noiselab.ucsd.edu/ECE228_2019/Reports/Report38.pdf, December 2021.

[56] Oord A, Li Y, Vinyals O. Representation learning with contrastive predictive coding. arXiv:1807.03748, 2019. https://arxiv.org/abs/1807.03748, January 2022.

[57] Schneider S, Baevski A, Collobert R, Auli M. wav2vec: Unsupervised pre-training for speech recognition. arXiv:1, 2019. https://arxiv.org/abs/1904.05862, December 2021.

[58] Saeed A, Grangier D, Zeghidour N. Contrastive learning of general-purpose audio representations. In Proc. the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, June 2021, pp.3875-3879. DOI: 10.1109/ICASSP39728.2021.94135.

[59] Al-Tahan H, Mohsenzadeh Y. CLAR: Contrastive learning of auditory representations. In Proc. the 24th International Conference on Artificial Intelligence and Statistics, April 2021, pp.2530-2538.

[60] Spijkervet J, Burgoyne J A. Contrastive learning of musical representations. arXiv:2103.09410, 2021. https://arxiv.org/abs/2103.09410, September 2021.

[61] Baevski A, Schneider S, Auli M. vq-wav2vec: Self-supervised learning of discrete speech representations. arXiv:1910.05453, 2020. https://arxiv.org/abs/1910.053v1, February 2022.

[62] Baevski A, Zhou H, Mohamed A, Auli M. wav2vec 2.0: A framework for self-supervised learning of speech representations. In Proc. the Annual Conference on Neural Information Processing Systems, December 2020.

[63] Chung Y A, Hsu W N, Tang H, Glass J. An unsupervised autoregressive model for speech representation learning. arXiv:1904.03240, 2019. https://arxiv.org/abs/1904.03240, December 2021.

[64] Chung Y A, Tang H, Glass J. Vector-quantized autoregressive predictive coding. arXiv:2005.08392, 2020. https://arxiv.org/abs/2005.08392, December 2021.

[65] Liu A T, Li S W, Lee H y. TERA: Self-supervised learning of transformer encoder representation for speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 2351-2366. DOI: 10.1109/TASLP.2021.3095662.

[66] Pascual S, Ravanelli M, Serrá J, Bonafonte A, Bengio Y. Learning problem-agnostic speech representations from multiple self-supervised tasks. arXiv:1904.03416, 2019. DOI: https://arxiv.org/abs/1904.03416, December 2021.

[67] Ravanelli M, Zhong J, Pascual S, Swietojanski P, Monteiro J, Trmal J, Bengio Y. Multi-task self-supervised learning for robust speech recognition. In Proc. the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2020, pp.6989-6993. DOI: 10.1109/ICASSP40776.2020.9053569.

[68] Wang S, Peng J, Ma J, Xu J. Protein secondary structure prediction using deep convolutional neural fields. Scientific Reports, 2016, 6(1f): Article No. 18962. DOI: 10.1038/srep18962.

[69] Altschul S F, Madden T L, Schäffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997, 25(17): 3389-3402. DOI: 10.1093/nar/25.17.3389.

[70] Asgari E, Mofrad M R. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One, 2015, 10(11): Article No. e0141287. DOI: 10.1371/journal.pone.0141287.

[71] Lu A X, Zhang H, Ghassemi M, Moses A M. Self-supervised contrastive learning of protein representations by mutual information maximization. bioRxiv, 2020. DOI: 10.1101/2020.09.04.283929.

[72] Rao R, Bhattacharya N, Thomas N, Duan Y, Chen X, Canny J, Abbeel P, Song Y S. Evaluating protein transfer learning with tape. In Proc. the Annual Conference on Neural Information Processing Systems, December 2019.

[73] Alley E C, Khimulya G, Biswas S, AlQuraishi M, Church G M. Unified rational protein engineering with sequencebased deep representation learning. Nature Methods, 2019, 16(12): 1315-1322. DOI: 10.1038/s41592-019-0598-1.

[74] Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics, 2019, 20(1): Article No. 723. DOI: 10.1186/s12859-019-3220-8.

[75] Strodthoff N, Wagner P, Wenzel M, Samek W. UDSMProt: Universal deep sequence models for protein classification. Bioinformatics, 2020, 36(8): 2401-2409. DOI: 10.1093/bioinformatics/btaa003.

[76] Min S, Park S, Kim S, Choi H S, Lee B, Yoon S. Pre-training of deep bidirectional protein sequence representations with structural information. IEEE Access, 2021, 9: 123912-123926. DOI: 10.1109/ACCESS.2021.3110269.

[77] Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick C L, Ma J, Fergus R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 2021, 118(15): Article No. e2016239118. DOI: 10.1073/pnas.2016239118.

[78] Elnaggar A, Heinzinger M, Dallago C et al. ProtTrans: Towards cracking the language of life’s code through self-supervised deep learning and high performance computing. arXiv:2007.06225, 2020. https://arxiv.org/abs/2007.06225, December 2021.

[79] He L, Zhang S, Wu L et al. Pre-training co-evolutionary protein representation via a pairwise masked language model. arXiv:2110.15527, 2021. https://arxiv.org/abs/2110.15527. October 2021.

[80] Mansoor S, Baek M, Madan U, Horvitz E. Toward more general embeddings for protein design: Harnessing joint representations of sequence and structure. bioRxiv, 2021. DOI: 10.1101/2021.09.01.458592.

[81] Rao R, Liu J, Verkuil R, Meier J, Canny J F, Abbeel P, Sercu T, Rives A. MSA transformer. bioRxiv, 2021. DOI: 10.1101/2021.02.12.430858.

[82] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the Annual Conference on Neural Information Processing Systems, December 2017, pp.5998-6008.

[83] Bender E M, Gebru T, McMillan-Major A, Shmitchell S. On the dangers of stochastic parrots: Can language models be too big? In Proc. the 2021 ACM Conference on Fairness, Accountability, and Transparency, March 2021, pp.610-623. DOI: 10.1145/3442188.3445922.

[84] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face recognition and clustering. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.815-823. DOI: 10.1109/CVPR.2015.7298682.

[85] Gutmann M, Hyvärinen A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proc. the 13th International Conference on Artificial Intelligence and Statistics, May 2010, pp.297-304.

[86] Cordts M, Omran M, Ramos S et al. The Cityscapes dataset for semantic urban scene understanding. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.3213-3223. DOI: 10.1109/CVPR.2016.350.

[87] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th International Conference on Neural Information Processing Systems, December 2012, pp.1097-1105.

[88] Kim D, Cho D, Yoo D, Kweon I S. Learning image representations by completing damaged jigsaw puzzles. In Proc. the IEEE Winter Conference on Applications of Computer Vision, March 2018, pp.793-802. DOI: 10.1109/WACV.2018.00092.

[89] Wei C, Xie L, Ren X, Xia Y, Su C, Liu J, Tian Q, Yuille A L. Iterative reorganization with weak spatial constraints: Solving arbitrary jigsaw puzzles for unsupervised representation learning. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.1910-1919. DOI: 10.1109/CVPR.2019.00201.

[90] Sohn K. Improved deep metric learning with multi-class N-pair loss objective. In Proc. the Annual Conference on Neural Information Processing Systems, December 2016, pp.1857-1865.

[91] Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D. Supervised contrastive learning. In Proc. the 34th Conference on Neural Information Processing Systems, December 2020, pp.18661-18673.

[92] Grill J B, Strub F, Altché F et al. Bootstrap your own latent—A new approach to self-supervised learning. In Proc. the Annual Conference on Neural Information Processing Systems, December 2020, pp.21281-21284.

[93] Dosovitskiy A, Beyer L, Kolesnikov A et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. the 9th International Conference on Learning Representations, May 2021.

[94] Gao T, Yao X, Chen D. SimCSE: Simple contrastive learning of sentence embeddings. arXiv:2104.08821, 2021. https://arxiv.org/abs/2104.08821, December 2021.

[95] Xu Y, Huang Q, Wang W, Foster P, Sigtia S, Jackson P J B, Plumbley M D. Unsupervised feature learning based on deep models for environmental audio tagging. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(6): 1230-1241. DOI: 10.1109/TASLP.2017.2690563.

[96] Chorowski J, Weiss R J, Bengio S, Oord A. Unsupervised speech representation learning using WaveNet autoencoders. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(12): 2041-2053. DOI: 10.1109/TASLP.2019.2938863.

[97] Gong Y, Lai C I J, Chung Y A, Glass J. SSAST: Self-supervised audio spectrogram transformer. arXiv:2110.09, 2021. https://arxiv.org/abs/2110.09784, October 2021.

[98] Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins: Structure, Function, and Bioinformatics, 1994, 18(4): 309-317. DOI: 10.1002/prot.340180402.

[99] Chen J, Chaudhari N S. Bidirectional segmented-memory recurrent neural network for protein secondary structure prediction. Soft Computing, 2006, 10(4): 315-324. DOI: 10.1007/s00500-005-0489-5.

[100] Krause B, Lu L, Murray I, Renals S. Multiplicative LSTM for sequence modelling. arXiv:1609.07959, 2016. https://arxiv.org/abs/1609.07959, December 2021.

[101] Suzek B E, Wang Y, Huang H, McGarvey P B, Wu C H, Consortium U. UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics, 2015, 31(6): 926-932. DOI: 10.1093/bioinformatics/btu739.

[102] Bepler T, Berger B. Learning protein sequence embeddings using information from structure. arXiv:1902.08661, 2019. https://arxiv.org/abs/1902.08661, February 2022.

[103] Jin W, Derr T, Liu H, Wang Y, Wang S, Liu Z, Tang J. Self-supervised learning on graphs: Deep insights and new direction. arXiv:2006.10141, 2020. https://arxiv.org/abs/2006.10141, December 2021.

[104] Le-Khac P H, Healy G, Smeaton A F. Contrastive representation learning: A framework and review. IEEE Access, 2020, 8: 193907-193934. DOI: 10.1109/ACCESS.2020.3031549.

[105] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, , and predict: A systematic survey of ing methods in natural language processing. arXiv:2107.13586, 2021. https://arxiv.org/abs/2107.13586, December 2021.

[106] Lin T, Wang Y, Liu X, Qiu X. A survey of transformers. arXiv:2106.04554, 2021. https://arxiv.org/abs/2106.04554, December 2021.

[107] Khan S, Naseer M, Hayat M, Zamir S W, Khan F S, Shah M. Transformers in vision: A survey. arXiv:2101.01169, 2021. https://arxiv.org/abs/2101.01169, October 2021.

[108] Liu Y, Zhang Y, Wang Y, Hou F, Yuan J, Tian J, Zhang Y, Shi Z, Fan J, He Z. A survey of visual transformers. arXiv:2111.06091, 2021. https://arxiv.org/abs/2111.06091, November 2021.

[109] Waikhom L, Patgiri R. Graph neural networks: Methods, applications, and opportunities. arXiv:2108.10733, 2021. https://arxiv.org/abs/2108.10733, December 2021.

[110] Xie Y, Xu Z, Zhang J, Wang Z, Ji S. Self-supervised learning of graph neural networks: A unified review. arXiv:2102.10757, 2021. https://arxiv.org/abs/2102.10757, March. 2022.

[111] You Y, Chen T, Wang Z, Shen Y. When does self-supervision help graph convolutional networks? In Proc. the 37th International Conference on Machine Learning, July 2020, pp.10871-10880.

[112] Gao W, Mahajan S P, Sulam J, Gray J J. Deep learning in protein structural modeling and design. Patterns, 2020, 1(9): Article No. 100142. DOI: 10.1016/j.patter.2020.100142.

[113] Defresne M, Barbe S, Schiex T. Protein design with deep learning. International Journal of Molecular Sciences, 2021, 22(21): Article No. 11741. DOI: 10.3390/ijms222111741.

[114] Strokach A, Kim P M. Deep generative modeling for protein design. arXiv:2109.13754, 2021. https://arxiv.org/abs/2, December 2021.

[115] Wu Z, Johnston K E, Arnold F H, Yang K K. Protein sequence design with deep generative models. Current Opinion in Chemical Biology, 2021, 65: 18-27. DOI: 10.1016/j.cbpa.2021.04.004.

[116] Jing L, Tian Y. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(11): 4037-4058. DOI: 10.1109/TPAMI.2020.2992393.

[117] Mao H H. A survey on self-supervised pre-training for sequential transfer learning in neural networks. arXiv:2007.00800, 2020. https://arxiv.org/abs/200v1, December 2021.

[118] Jaiswal A, Babu A R, Zadeh M Z, Banerjee D, Makedon F. A survey on contrastive self-supervised learning. Technologies, 2021, 9(1): Article No. 2. DOI: 10.3390/technologies9010002.

[119] Liu Y, Pan S, Jin M, Zhou C, Xia F, Yu P S. Graph self-supervised learning: A survey. arXiv:2103.00111, 2021. https://arxiv.org/abs/2103.00111, February 2022.

[120] Wang H, Ma S, Dong L, Huang S, Zhang D, Wei F. DeepNet: Scaling transformers to 1,000 layers. arXiv:2203.00555, 2022. https://arxiv.org/abs/2203.00555, March 2022.

[121] Qin Y, Zhang J, Lin Y, Liu Z, Li P, Sun M, Zhou J. ELLE: Efficient lifelong pre-training for emerging data. arXiv:2203.06311, 2022. https://arxiv.org/abs/2203.06311, March 2022.

[122] Li Z, Hoiem D. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(12): 2935-2947. DOI: 10.1109/TPAMI.2017.2773081.

[123] Ouyang L, Wu J, Jiang X. Training language models to follow instructions with human feedback. arXiv:2203.02155v1, 2022. https://arxiv.org/abs/2203.02155, March 2022.

[1] 刘凡, 陈德龙, 周睿志, 杨赛, 许峰. 基于自监督音频动作同步性学习的音乐驱动的指挥动作生成[J]. 计算机科学技术学报, 2022, 37(3): 539-558.
[2] 孙鹏飞, 欧阳亚文, 宋定杰, 戴新宇. 基于自监督任务增强的小样本意图识别[J]. 计算机科学技术学报, 2022, 37(3): 527-538.
[3] Reza Jafari Ziarani, Reza Ravanmehr. 推荐系统中的意外效应:系统文献综述[J]. 计算机科学技术学报, 2021, 36(2): 375-396.
[4] Osamu Tatebe, Shukuko Moriwake, Yoshihiro Oyama. Gfarm/BB—节点本地突发缓冲(Burst Buffer)的Gfarm文件系统[J]. 计算机科学技术学报, 2020, 35(1): 61-71.
[5] Da-Wei Wang, Wan-Qiu Cui, Biao Qin. 基于标签属性图节点凝聚力的CK-modes聚类算法[J]. 计算机科学技术学报, 2019, 34(5): 1152-1166.
[6] Lin Wu, Min Li, Jian-Xin Wang, Fang-Xiang Wu. 可控性及其在生物网络的应用[J]. 计算机科学技术学报, 2019, 34(1): 16-34.
[7] Yawar Abbas Bangash, Ling-Fang Zeng, Dan Feng. MimiBS:模仿基站以在无线传感器网络中提供地址隐私保护[J]. , 2017, 32(5): 991-1007.
[8] Xin Bi, Xiang-Guo Zhao, Guo-Ren Wang. 基于节点分发的分布式Twig查询处理技术[J]. , 2017, 32(1): 78-92.
[9] Ming-Ming Cheng, Qi-Bin Hou, Song-Hai Zhang, Paul L. Rosin. 可视媒体智能处理:图形学与视觉的融合[J]. , 2017, 32(1): 110-121.
[10] Hui Li, Jiang-Tao Cui, Jian-Feng Ma. 一个三层次的在线网络社交影响研究综述[J]. , 2015, 30(1): 184-199.
[11] Xiao-Long Zheng and Meng Wan. 无线传感器网络中数据分发方法综述[J]. , 2014, 29(3): 470-486.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] 黄学东; 蔡莲红; 方棣棠; 迟边进; 周立; 蒋力;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] 许小曙;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] 衷仁保; 邢林; 任朝阳;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: