[1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th Int. Conf. Neural Information Processing Systems, December 2012, pp.10971105.
[2] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In Proc. European Conference on Computer Vision, September 2014, pp.818833.
[3] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.580587.
[4] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.34313440.
[5] Hinton G, Deng L, Yu D, Dahl G E, Mohamed A R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T N, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups. IEEE Signal Processing Magazine, 2012, 29(6):8297.
[6] Graves A, Mohamed A R, Hinton G E. Speech recognition with deep recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP), May 2013, pp.66456649.
[7] Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Information Processing Systems, December 2013, pp.31113119.
[8] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems, December 2014, pp.31043112.
[9] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014. http://arxiv.org/abs/1409.0473, May 2017.
[10] Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Wierstra D K D, Legg S, Hassabis D. Humanlevel control through deep reinforcement learning. Nature, 2015, 518(7540):529533.
[11] Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587):484489.
[12] He K M, Zhang X Y, Ren S Q Sun J. Identity mappings in deep residual networks. In Proc. the 14th European Conf. Computer Vision (ECCV), October 2016, pp.630645.
[13] Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. arXiv:1409.1556, 2014. http://arxiv.org/abs/1409.1556, May 2017.
[14] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015.
[15] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.770778.
[16] Galal S, Horowitz M. Energyefficient floatingpoint unit design. IEEE Trans. Computers, 2011, 60(7):913922.
[17] Hochreiter S, Schmidhuber J. Long shortterm memory. Neural Computation, 1997, 9(8):17351780.
[18] Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014. http://arxiv.org/abs/1412.3555, May 2017.
[19] Pham P H, Jelaca D, Farabet C, Martini B, LeCun Y, Culurciello E. NeuFlow:Dataflow vision processing systemonachip. In Proc. the 55th IEEE Int. Midwest Symp. Circuits and Systems (MWSCAS), August 2012, pp.10441047.
[20] Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Temam O. DianNao:A smallfootprint highthroughput accelerator for ubiquitous machinelearning. In Proc. the 9th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2014, pp.269284.
[21] Luo T, Liu S L, Li L, Wang Y Q, Zhang S J, Chen T S, Xu Z W, Temam O, Chen Y J. DaDianNao:A neural network supercomputer. IEEE Trans. Computers, 2017, 66(1):7388.
[22] Denton E L, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In Proc. the 27th Int. Conf. Neural Information Processing Systems, December 2014, pp.12691277.
[23] Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. In Proc. British Machine Vision Conference (BMVC), September 2014.
[24] Tai C, Xiao T, Zhang Y, Wang X G, E W N. Convolutional neural networks with lowrank regularization. arXiv:1511.06067, 2015. http://arxiv.org/abs/1511.06067, May 2017.
[25] Zhou S C, Wu J N, Wu Y X, Zhou X Y. Exploiting local structures with the Kronecker layer in convolutional networks. arXiv:1512.09194, 2015. https://arxiv.org/abs/1512.09194, May 2017.
[26] Novikov A, Podoprikhin D, Osokin A, Vetrov D. Tensorizing neural networks. In Proc. Advances in Neural Information Processing Systems, December 2015, pp.442450.
[27] Zhang X Y, Zou J H, He K M, Sun J. Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2016, 38(10):19431955.
[28] Anwar S, Hwang K, Sung W. Structured pruning of deep convolutional neural networks. arXiv:1512.08571, 2015. http://arxiv.org/abs/1512.08571, May 2017.
[29] Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. Advances in Neural Information Processing Systems, December 2015, pp.11351143.
[30] Han S, Mao H, Dally W J. Deep compression:Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:1510.00149, 2015. https://arxiv.org/abs/1510.00149, May 2017.
[31] Liu B Y,Wang M, Foroosh H, Tappen M, Penksy M. Sparse convolutional neural networks. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), June 2015, pp.806814.
[32] Cheng Y, Yu F X, Feris R S, Kumar S, Choudhary A, Chang S F. An exploration of parameter redundancy in deep networks with circulant projections. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.28572865.
[33] Chen W L, Wilson J T, Tyree S, Weinberger K Q, Chen Y X. Compressing neural networks with the hashing trick. In Proc. the 32nd Int. Conf. Int. Machine Learning, July 2015, pp.22852294.
[34] Chen W L, Wilson J, Tyree S, Weinberger K Q, Chen Y X. Compressing convolutional neural networks in the frequency domain. In Proc. the 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2016, pp.14751484.
[35] Anguita D, Carlino L, Ghio A, Ridella S. A FPGA core generator for embedded classification systems. Journal of Circuits Systems and Computers, 2011, 20(2):263282.
[36] Vanhoucke V, Senior A, Mao M Z. Improving the speed of neural networks on CPUs. In Proc. Deep Learning and Unsupervised Feature Learning Workshop, December 2011.
[37] Alvarez R, Prabhavalkar R, Bakhtin A. On the efficient representation and execution of deep acoustic models. In Proc. the 17th Annual Conf. the Int. Speech Communication Association, September 2016, pp.27462750.
[38] Zen H, Agiomyrgiannakis Y, Egberts N, Henderson F, Szczepaniak P. Fast, compact, and high quality LSTMRNN based statistical parametric speech synthesizers for mobile devices. In Proc. the 17th Annual Conf. the Int. Speech Communication Association, September 2016, pp.22732277.
[39] Gong Y C, Liu L, Yang M, Bourdev L. Compressing deep convolutional networks using vector quantization. arXiv:1412.6115, 2014. https://arxiv.org/abs/1412.6115, May 2017.
[40] Merolla P, Appuswamy R, Arthur J, Esser S K, Modha D. Deep neural networks are robust to weight binarization and other nonlinear distortions. arXiv:1606.01981, 2016. https://arxiv.org/abs/1606.01981, May 2017.
[41] Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P. Deep learning with limited numerical precision. arXiv:1502.02551, 2015. http://arxiv.org/abs/1502.02551, May 2017.
[42] Courbariaux M, Bengio Y. BinaryNet:Training deep neural networks with weights and activations constrained to +1 or 1. arXiv:1602.02830v1, 2016. http://arxiv.org/abs/1602.02830v1, May 2017.
[43] Wu J X, Leng C, Wang Y H, Hu Q H, Cheng J. Quantized convolutional neural networks for mobile devices. arXiv:1512.06473, 2016. https://www.arxiv.org/abs/1512.06473, May 2017.
[44] Kim M, Smaragdis P. Bitwise neural networks. arXiv:1601.06071, 2016. https://arxiv.org/abs/1601.06071, May 2017.
[45] Hubara I, Courbariaux M, Soudry D, ElYaniv R, Bengio Y. Binarized neural networks. In Proc. the 30th Conf. Neural Information Processing Systems, December 2016, pp.41074115.
[46] Rastegari M, Ordonez V, Redmon J, Farhadi A. XNORNet:ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conf. Computer Vision, October 2016, pp.525542.
[47] Hinton G, Srivastava N, Swersky K. Coursera:Neural networks for machine learning. 2012. https://www.classcentral. com/mooc/398/courseraneuralnetworksformach inelearning, May 2017.
[48] Bengio Y, Léonard N, Courville A C. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432, 2013. http://adsabs.harvard. edu/abs/2013arXiv1308.3432B, May 2017.
[49] Hwang K, Sung W. Fixedpoint feedforward deep neural network design using weights +1, 0, and 1. In Proc. IEEE Workshop on Signal Processing Systems, October 2014.
[50] Shin S, Hwang K, Sung W. Fixedpoint performance analysis of recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP), March 2016, pp.976980.
[51] Hubara I, Courbariaux M, Soudry D, ElYaniv R, Bengio Y. Quantized neural networks:Training neural networks with low precision weights and activations. arXiv:1609.07061, 2016. http://arxiv.org/abs/1609.07061, May 2017.
[52] Miyashita D, Lee E H, Murmann B. Convolutional neural networks using logarithmic data representation. arXiv:1603.01025, 2016. https://arxiv.org/abs/1603.01025, May 2017.
[53] Zhou S C, Wu Y X, Ni Z K, Zhou X Y, Wen H, Zou Y H. DoReFaNet:Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160, 2016. https://www.arxiv.org/abs/1606.06160, May 2017.
[54] Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z F, Citro C, Corrado G S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y Q, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X Q. TensorFlow:Largescale machine learning on heterogeneous distributed systems. arXiv:1603.04467, 2015. https://arxiv.org/abs/1603.04467, May 2017.
[55] Andri R, Cavigelli L, Rossi D, Benini L. YodaNN:An ultralow power convolutional neural network accelerator based on binary weights. In Proc. IEEE Computer Society Annual Symposium on VLSI, July 2016, pp.236241.
[56] Lee M, Hwang K, Park J, Choi S, Shin S, Sung W. FPGAbased lowpower speech recognition with recurrent neural networks. In Proc. IEEE Int. Workshop on Signal Processing Systems, October 2016, pp.230235.
[57] Courbariaux M, Bengio Y, David J P. BinaryConnect:Training deep neural networks with binary weights during propagations. In Proc. the 28th Int. Conf. Neural Information Processing Systems, December 2015, pp.31233131.
[58] Saxe A M, Koh P W, Chen Z H, Bhand M, Suresh B, Ng A Y. On random weights and unsupervised feature learning. In Proc. the 28th Int. Conf. Machine Learning, June 2011, pp.10891096.
[59] Giryes R, Sapiro G, Bronstein A M. Deep neural networks with random gaussian weights:A universal classification strategy? IEEE Trans. Signal Processing, 2016, 64(13):34443457.
[60] Heckbert P. Color image quantization for frame buffer display. In Proc. the 9th Annual Conf. Computer Graphics and Interactive Techniques, July 1982, pp.297307.
[61] Mallows C. Another comment on o'cinneide. The American Statistician, 1991, 45(3):257.
[62] Ioffe S, Szegedy C. Batch normalization:Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015. https://arxiv.org/abs/1502.03167, May 2017.
[63] Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. In Proc. Workshop on Deep Learning and Unsupervised Feature Learning, Dec. 2011.
[64] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Li F F. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3):211252.
[65] Gysel P, Motamedi M, Ghiasi S. Hardwareoriented approximation of convolutional neural networks. arXiv:1604.03168, 2016. http://arxiv.org/abs/1604.03168, May 2017.
[66] Taylor A, Marcus M, Santorini B. The Penn Treebank:An overview. In Treebanks, Abeillé A(ed.), Springer, 2003, pp.522.
