计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (4): 924-938.doi: 10.1007/s11390-019-1950-8

所属专题: Artificial Intelligence and Pattern Recognition

• • 上一篇    

SRNET:用于解析奇点的基于浅跳跃连接的卷积神经网络

Robail Yasrab   

  1. Computer Vision Laboratory, School of Computer Science, University of Nottingham, Nottingham, NG8-1 BB, U.K
  • 收稿日期:2018-06-12 修回日期:2019-05-24 出版日期:2019-07-11 发布日期:2019-07-11
  • 作者简介:Robail Yasrab received his Ph.D.degree in computer vision from the School of Computer Science,University of Science and Technology of China,Hefei,in 2017.He is currently a research fellow at the Computer Vision Laboratory,School of Computer Science,University of Nottingham,Nottingham,United Kingdom.His research interests include artificial intelligence,computer vision,deep leaning,plant phenotyping,and particularly the application and adaptation of modern machine learning techniques to real-world problems.

SRNET: A Shallow Skip Connection Based Convolutional Neural Network Design for Resolving Singularities

Robail Yasrab   

  1. Computer Vision Laboratory, School of Computer Science, University of Nottingham, Nottingham, NG8-1 BB, U.K
  • Received:2018-06-12 Revised:2019-05-24 Online:2019-07-11 Published:2019-07-11

近年来,卷积神经网络(CNNs)已经取得巨大的进展和并拥有卓越的性能。自问世以来,CNN在许多分类和分割任务中展现了出色的性能。目前,CNN家族包括诸多不同的构架,并广泛应用于大多基于视觉的识别任务。然而,通过简单地叠加卷积模块构建神经网络,不可避免地限制了其优化能力并导致过度拟合和梯度消失的问题。网络奇点是引起前面提到问题的关键原因之一,并在损失状况中,最近已经引起了损失表面中的流形退化。这导致了缓慢的学习过程和低性能。因此,跳跃连接成为CNN设计中缓解网络奇点的重要部分。本文旨在采用NN构架中的跳跃连接以增强信息流,缓解奇点并改善性能。本研究检验了不同层次的跳跃连接,并针对任一CNN提出了这些链接的替代策略。为了验证本文提出的假设,我们设计了一个实验CNN构架,称为Shallow Wide ResNet或SRNet。它使用了宽残差网络为基础网络设计。我们已经做了大量实验以评价本文工作的有效性。我们使用了2个众所周知的数据集,CIF AR-10和CIF AR-100,来训练和测试CNNs。最终实证结果表明在网络奇点方面,其性能、效率和奇点缓解均取得不错的成绩。

关键词: 卷积神经网络(CNN), 宽残差网络(WRN), 信号丢失, 跳跃连接, 深度神经网络(DNN)

Abstract: Convolutional neural networks (CNNs) have shown tremendous progress and performance in recent years. Since emergence, CNNs have exhibited excellent performance in most of classification and segmentation tasks. Currently, the CNN family includes various architectures that dominate major vision-based recognition tasks. However, building a neural network (NN) by simply stacking convolution blocks inevitably limits its optimization ability and introduces overfitting and vanishing gradient problems. One of the key reasons for the aforementioned issues is network singularities, which have lately caused degenerating manifolds in the loss landscape. This situation leads to a slow learning process and lower performance. In this scenario, the skip connections turned out to be an essential unit of the CNN design to mitigate network singularities. The proposed idea of this research is to introduce skip connections in NN architecture to augment the information flow, mitigate singularities and improve performance. This research experimented with different levels of skip connections and proposed the placement strategy of these links for any CNN. To prove the proposed hypothesis, we designed an experimental CNN architecture, named as Shallow Wide ResNet or SRNet, as it uses wide residual network as a base network design. We have performed numerous experiments to assess the validity of the proposed idea. CIFAR-10 and CIFAR-100, two well-known datasets are used for training and testing CNNs. The final empirical results have shown a great many of promising outcomes in terms of performance, efficiency and reduction in network singularities issues.

Key words: convolutional neural network(CNN), wide residual network(WRN), dropout, skip connection, deep neural network(DNN)

[1] Krizhevsky A, Ilya S, Geoffrey E H. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conference on Neural Information Processing Systems, December 2012, pp.1106-1114.
[2] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Berg A C. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3):211-252.
[3] LeCun Y, Yoshua B, Geoffrey E H. Deep learning. Nature, 2015, 521(7553):436-444.
[4] Zou W Y, Wang X, Sun M, Lin Y. Generic object detection with dense neural patterns and regionlets. arXiv:1404.4316, 2014. https://arxiv.org/abs/1404.4316, July 2018.
[5] Lin M, Chen Q, Yan S. Network in network. arXiv:13-12.4400, 2013. https://arxiv.org/abs/1312.4400, July 2018.
[6] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. OverFeat:Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013. https://arxiv.org/abs/1312.6229, July 2018.
[7] Simonyan K. Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, July 2018.
[8] Yasrab R. ECRU:An encoder-decoder based convolution neural network (CNN) for road-scene understanding. Journal of Imaging, 2018, 4(10):Article No. 116.
[9] Yasrab R, Gu N, Zhang X. SCNet:A simplified encoderdecoder CNN for semantic segmentation. In Proc. the 5th International Conference on Computer Science and Network Technology, December 2016, pp.785-789.
[10] Yasrab R, Gu N, Zhang X. An encoder-decoder based convolution neural network (CNN) for future advanced driver assistance system (ADAS). Applied Sciences, 2017, 7(4):Article No. 312.
[11] Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In Proc. the 30th International Conference on Machine Learning, June 2013, pp.1139-1147.
[12] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. the 13th International Conference on Artificial Intelligence and Statistics, May 2010, pp.249-256.
[13] He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers:Surpassing human-level performance on ImageNet classification. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1026-1034.
[14] Lee C Y, Xie S, Gallagher P, Zhang Z, Tu Z. Deeplysupervised nets. In Proc. the 18th International Conference on Artificial Intelligence and Statistics, May 2015, pp.562-570.
[15] Raiko T, Valpola H, LeCun Y. Deep learning made easier by linear transformations in perceptrons. In Proc. the 15th International Conference on Artificial Intelligence and Statistics, April 2012, pp.924-932.
[16] Schmidhuber J. Learning complex, extended sequences using the principle of history compression. Neural Computation, 1992, 4(2):234-242.
[17] Chen T, Goodfellow I, Shlens J. Net2net:Accelerating learning via knowledge transfer. arXiv:1511.05641, 2015. https://arxiv.org/abs/1511.05641, November 2018.
[18] Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitNets:Hints for thin deep nets. arXiv:1412.6-550, 2014. https://arxiv.org/abs/1412.6550, July 2018.
[19] Wei H, Zhang J, Cousseau F, Ozeki T, Amari S. Dynamics of learning near singularities in layered networks. Neural Computation, 2008, 20(3):813-843.
[20] Amari S I, Park H, Ozeki T. Singularities affect dynamics of learning in neuromanifolds. Neural Computation, 2006, 18(5), 1007-1065.
[21] Saxe A M, McClelland J L, Ganguli S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120, 2013. https://arxiv.org/abs/1312.6120, August 2018.
[22] Orhan A E, Pitkow X. Skip connections eliminate singularities. arXiv:1701.09175, 2017. https://arxiv.org/abs/17-01.09175, September 2018.
[23] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778.
[24] Huang G, Sun Y, Liu Z, Sedra D, Weinberger K Q. Deep networks with stochastic depth. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.646-661.
[25] He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.630-645.
[26] Srivastava R K, Greff K, Schmidhuber J. Highway networks. arXiv:1505.00387, 2015. https://arxiv.org/abs/1505.00387, June 2018.
[27] Zhang K, Sun M, Han X, Yuan X, Guo L, Liu T. Residual networks of residual networks:Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(6):1303-1314.
[28] Zhang K, Guo L, Gao C, Zhao Z. Pyramidal RoR for image classification. arXiv:1710.00307, 2017. https://arxiv.org/abs/1710.00307, May 2018.
[29] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1-9.
[30] Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 1994, 5(2):157-166.
[31] Shen F, Gan R, Zeng G. Weighted residuals for very deep networks. In Proc. the 3rd International Conference on Systems and Informatics, November 2016, pp.936-941.
[32] Bengio Y, LeCun Y. Scaling learning algorithms towards AI. In Large-Scale Kernel Machines, Bottou L, Chapelle O, DeCoste D, Weston J (eds.), MIT Press, 2017.
[33] Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y. An empirical evaluation of deep architectures on problems with many factors of variation. In Proc. the 24th International Conference on Machine Learning, June 2007, pp.473-480.
[34] Zagoruyko S, Komodakis N. Wide residual networks. arXiv:1605.07146, 2016. https://arxiv.org/abs/1605.07146, January 2019.
[35] Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout:A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[36] Huang G, Liu Z, Weinberger K Q, Maaten L. Densely connected convolutional networks. arXiv:1608.06993, 2016. https://arxiv.org/abs/1608.06993, September 2018.
[37] Han D, Kim J, Kim J. Deep pyramidal residual networks. arXiv:1610.02915, 2016. https://arxiv.org/abs/1610.02915, July 2018.
[38] Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.5987-5995.
[39] Szegedy C, Loffe S, Vanhoucke V, Alemi A A. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4278-4284.
[40] Loffe S, Szegedy C. Batch normalization:Accelerating deep network training by reducing internal covariate shift. In Proc. the 32nd International Conference on Machine Learning, July 2015, pp.448-456.
[41] Nair V, Hinton G E. Rectified linear units improve restricted Boltzmann machines. In Proc. the 27th International Conference on Machine Learning, June 2010, pp.807-814.
[42] Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012. https://arxiv.org/abs/1207.0580, July 2018.
[43] Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe:Convolutional architecture for fast feature embedding. In Proc. the 22nd ACM International Conference on Multimedia, November 2014, pp.675-678.
[44] LeCun Y, Boser B, Denker J S, Henderson D, Howard R E, Hubbard W, Jackel L D. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989, 1(4):541-551.
[45] Rastegari M, Ordonez V, Redmon J, Farhadi A. XNORNet:ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.525-542.
[46] Sheen S, Lyu J. Median binary-connect method and a binary convolutional neural network for word recognition. arXiv:1811.02784v1, 2018. https://arxiv.org/abs/18-11.02784v1, December 2018.
[47] Lin X, Zhao C, Pan W. Towards accurate binary convolutional neural network. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.344-352.
[48] Juefei-Xu F, Boddeti V N, Savvides M. Local binary convolutional neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.4284-4293.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孙成政; 慈云桂;. A New Method for Describing the AND-OR-Parallel Execution of Logic Programs[J]. , 1988, 3(2): 102 -112 .
[2] 许满武;. An Implementation of Pure Horn Clause Logic Programming in a Reduction System[J]. , 1993, 8(3): 53 -61 .
[3] 周景洲;. A Neural Network Model Based on Logical Operations[J]. , 1998, 13(5): 464 -470 .
[4] David de Frutos-Escrig; Luis Liana-Diaz; Manuel Nunez;. An invitation to Friendly Testing[J]. , 1998, 13(6): 531 -545 .
[5] 徐鹰. 探索细菌基因组结构中面临的计算挑战[J]. , 2010, 25(1): 53 -70 .
[6] 马斌. 蛋白质组学中质谱数据的计算分析所面临的挑战[J]. , 2010, 25(1): 107 -123 .
[7] . [J]. , 2010, 25(4): 864 -873 .
[8] Chaveevan Pechsiri, Rapepun Piriyakul. [J]. , 2010, 25(5): 1055 -1070 .
[9] Mahsa Chitsaz, and Chaw Seng Woo, Member, IEEE. [J]. , 2011, 26(2): 247 -255 .
[10] Mohamed Abdel-Kawy Mohamed Ali Soliman and Rasha M. Abo-Bakr. 使用自适应方法的线性和二次可分分类机[J]. , 2011, 26(5): 908 -918 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: