|
计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (4): 924-938.doi: 10.1007/s11390-019-1950-8
所属专题: Artificial Intelligence and Pattern Recognition
• • 上一篇
Robail Yasrab
Robail Yasrab
近年来,卷积神经网络(CNNs)已经取得巨大的进展和并拥有卓越的性能。自问世以来,CNN在许多分类和分割任务中展现了出色的性能。目前,CNN家族包括诸多不同的构架,并广泛应用于大多基于视觉的识别任务。然而,通过简单地叠加卷积模块构建神经网络,不可避免地限制了其优化能力并导致过度拟合和梯度消失的问题。网络奇点是引起前面提到问题的关键原因之一,并在损失状况中,最近已经引起了损失表面中的流形退化。这导致了缓慢的学习过程和低性能。因此,跳跃连接成为CNN设计中缓解网络奇点的重要部分。本文旨在采用NN构架中的跳跃连接以增强信息流,缓解奇点并改善性能。本研究检验了不同层次的跳跃连接,并针对任一CNN提出了这些链接的替代策略。为了验证本文提出的假设,我们设计了一个实验CNN构架,称为Shallow Wide ResNet或SRNet。它使用了宽残差网络为基础网络设计。我们已经做了大量实验以评价本文工作的有效性。我们使用了2个众所周知的数据集,CIF AR-10和CIF AR-100,来训练和测试CNNs。最终实证结果表明在网络奇点方面,其性能、效率和奇点缓解均取得不错的成绩。
[1] Krizhevsky A, Ilya S, Geoffrey E H. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conference on Neural Information Processing Systems, December 2012, pp.1106-1114. [2] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Berg A C. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3):211-252. [3] LeCun Y, Yoshua B, Geoffrey E H. Deep learning. Nature, 2015, 521(7553):436-444. [4] Zou W Y, Wang X, Sun M, Lin Y. Generic object detection with dense neural patterns and regionlets. arXiv:1404.4316, 2014. https://arxiv.org/abs/1404.4316, July 2018. [5] Lin M, Chen Q, Yan S. Network in network. arXiv:13-12.4400, 2013. https://arxiv.org/abs/1312.4400, July 2018. [6] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. OverFeat:Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013. https://arxiv.org/abs/1312.6229, July 2018. [7] Simonyan K. Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, July 2018. [8] Yasrab R. ECRU:An encoder-decoder based convolution neural network (CNN) for road-scene understanding. Journal of Imaging, 2018, 4(10):Article No. 116. [9] Yasrab R, Gu N, Zhang X. SCNet:A simplified encoderdecoder CNN for semantic segmentation. In Proc. the 5th International Conference on Computer Science and Network Technology, December 2016, pp.785-789. [10] Yasrab R, Gu N, Zhang X. An encoder-decoder based convolution neural network (CNN) for future advanced driver assistance system (ADAS). Applied Sciences, 2017, 7(4):Article No. 312. [11] Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In Proc. the 30th International Conference on Machine Learning, June 2013, pp.1139-1147. [12] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. the 13th International Conference on Artificial Intelligence and Statistics, May 2010, pp.249-256. [13] He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers:Surpassing human-level performance on ImageNet classification. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1026-1034. [14] Lee C Y, Xie S, Gallagher P, Zhang Z, Tu Z. Deeplysupervised nets. In Proc. the 18th International Conference on Artificial Intelligence and Statistics, May 2015, pp.562-570. [15] Raiko T, Valpola H, LeCun Y. Deep learning made easier by linear transformations in perceptrons. In Proc. the 15th International Conference on Artificial Intelligence and Statistics, April 2012, pp.924-932. [16] Schmidhuber J. Learning complex, extended sequences using the principle of history compression. Neural Computation, 1992, 4(2):234-242. [17] Chen T, Goodfellow I, Shlens J. Net2net:Accelerating learning via knowledge transfer. arXiv:1511.05641, 2015. https://arxiv.org/abs/1511.05641, November 2018. [18] Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitNets:Hints for thin deep nets. arXiv:1412.6-550, 2014. https://arxiv.org/abs/1412.6550, July 2018. [19] Wei H, Zhang J, Cousseau F, Ozeki T, Amari S. Dynamics of learning near singularities in layered networks. Neural Computation, 2008, 20(3):813-843. [20] Amari S I, Park H, Ozeki T. Singularities affect dynamics of learning in neuromanifolds. Neural Computation, 2006, 18(5), 1007-1065. [21] Saxe A M, McClelland J L, Ganguli S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120, 2013. https://arxiv.org/abs/1312.6120, August 2018. [22] Orhan A E, Pitkow X. Skip connections eliminate singularities. arXiv:1701.09175, 2017. https://arxiv.org/abs/17-01.09175, September 2018. [23] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. [24] Huang G, Sun Y, Liu Z, Sedra D, Weinberger K Q. Deep networks with stochastic depth. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.646-661. [25] He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.630-645. [26] Srivastava R K, Greff K, Schmidhuber J. Highway networks. arXiv:1505.00387, 2015. https://arxiv.org/abs/1505.00387, June 2018. [27] Zhang K, Sun M, Han X, Yuan X, Guo L, Liu T. Residual networks of residual networks:Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(6):1303-1314. [28] Zhang K, Guo L, Gao C, Zhao Z. Pyramidal RoR for image classification. arXiv:1710.00307, 2017. https://arxiv.org/abs/1710.00307, May 2018. [29] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1-9. [30] Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 1994, 5(2):157-166. [31] Shen F, Gan R, Zeng G. Weighted residuals for very deep networks. In Proc. the 3rd International Conference on Systems and Informatics, November 2016, pp.936-941. [32] Bengio Y, LeCun Y. Scaling learning algorithms towards AI. In Large-Scale Kernel Machines, Bottou L, Chapelle O, DeCoste D, Weston J (eds.), MIT Press, 2017. [33] Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y. An empirical evaluation of deep architectures on problems with many factors of variation. In Proc. the 24th International Conference on Machine Learning, June 2007, pp.473-480. [34] Zagoruyko S, Komodakis N. Wide residual networks. arXiv:1605.07146, 2016. https://arxiv.org/abs/1605.07146, January 2019. [35] Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout:A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1):1929-1958. [36] Huang G, Liu Z, Weinberger K Q, Maaten L. Densely connected convolutional networks. arXiv:1608.06993, 2016. https://arxiv.org/abs/1608.06993, September 2018. [37] Han D, Kim J, Kim J. Deep pyramidal residual networks. arXiv:1610.02915, 2016. https://arxiv.org/abs/1610.02915, July 2018. [38] Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.5987-5995. [39] Szegedy C, Loffe S, Vanhoucke V, Alemi A A. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4278-4284. [40] Loffe S, Szegedy C. Batch normalization:Accelerating deep network training by reducing internal covariate shift. In Proc. the 32nd International Conference on Machine Learning, July 2015, pp.448-456. [41] Nair V, Hinton G E. Rectified linear units improve restricted Boltzmann machines. In Proc. the 27th International Conference on Machine Learning, June 2010, pp.807-814. [42] Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012. https://arxiv.org/abs/1207.0580, July 2018. [43] Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe:Convolutional architecture for fast feature embedding. In Proc. the 22nd ACM International Conference on Multimedia, November 2014, pp.675-678. [44] LeCun Y, Boser B, Denker J S, Henderson D, Howard R E, Hubbard W, Jackel L D. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989, 1(4):541-551. [45] Rastegari M, Ordonez V, Redmon J, Farhadi A. XNORNet:ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.525-542. [46] Sheen S, Lyu J. Median binary-connect method and a binary convolutional neural network for word recognition. arXiv:1811.02784v1, 2018. https://arxiv.org/abs/18-11.02784v1, December 2018. [47] Lin X, Zhao C, Pan W. Towards accurate binary convolutional neural network. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.344-352. [48] Juefei-Xu F, Boddeti V N, Savvides M. Local binary convolutional neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.4284-4293. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |