Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (5): 1002-1021.doi: 10.1007/s11390-021-1217-z

Special Issue: Artificial Intelligence and Pattern Recognition

• Special Section of APPT 2021 (Part 1) • Previous Articles     Next Articles

Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View

Tong Chen1, Ji-Qiang Liu1, He Li1, Shuo-Ru Wang1, Wen-Jia Niu1,*, Member, CCF En-Dong Tong1,*, Member, CCF, Liang Chang2, Qi Alfred Chen3, and Gang Li4, Member, IEEE   

  1. 1 Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing Jiaotong University Beijing 100044, China;
    2 Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China;
    3 Donald Bren School of Information and Computer Sciences, University of California, Irvine 92697, U.S.A.;
    4 Centre for Cyber Security Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
  • Received:2020-12-12 Revised:2021-07-26 Online:2021-09-30 Published:2021-09-30
  • About author:Tong Chen received her M.S. degree in cyber security from Beijing Jiaotong University, Beijing, in 2018. She is currently a Ph.D. candidate of cyber security in Beijing Jiaotong University, Beijing. Her main research interests are cyber security and reinforcement learning security.
  • Supported by:
    The work was supported by the National Natural Science Foundation of China under Grant Nos. 61972025, 61802389, 61672092, U1811264, and 61966009, the National Key Research and Development Program of China under Grant Nos. 2020YFB1005604 and 2020YFB2103802, and Guangxi Key Laboratory of Trusted Software under Grant No. KX201902.

Reinforcement learning as autonomous learning is greatly driving artificial intelligence (AI) development to practical applications. Having demonstrated the potential to significantly improve synchronously parallel learning, the parallel computing based asynchronous advantage actor-critic (A3C) opens a new door for reinforcement learning. Unfortunately, the acceleration's influence on A3C robustness has been largely overlooked. In this paper, we perform the first robustness assessment of A3C based on parallel computing. By perceiving the policy's action, we construct a global matrix of action probability deviation and define two novel measures of skewness and sparseness to form an integral robustness measure. Based on such static assessment, we then develop a dynamic robustness assessing algorithm through situational whole-space state sampling of changing episodes. Extensive experiments with different combinations of agent number and learning rate are implemented on an A3C-based pathfinding application, demonstrating that our proposed robustness assessment can effectively measure the robustness of A3C, which can achieve an accuracy of 83.3%.

Key words: robustness assessment; skewness; sparseness; asynchronous advantage actor-critic; reinforcement learning;

[1] Fabisch A, Petzoldt C, Otto M, Kirchner F. A survey of behavior learning applications in robotics-State of the art and perspectives. arXiv:1906.01868, 2019. https://arxiv.org/abs/1906.01868, June 2021.
[2] Silver D, Huang A, Maddison C J et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587):484-489. DOI:10.1038/nature16961.
[3] Mnih V, Kavukcuoglu K, Silver D et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529-533. DOI:10.1038/nature14236.
[4] Tamar A, Wu Y, Thomas G, Levine S, Abbeel P. Value iteration networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.2154-2162.
[5] Watkins C. Learning from delayed rewards[Ph.D. Thesis]. University of Cambridge, England, 1989.
[6] Grounds M, Kudenko D. Parallel reinforcement learning with linear function approximation. In Proc. the 6th European Conference on Adaptive and Learning Agents and Multiagent Systems:Adaptation and Multi-Agent Learning, May 2007, Article No. 45. DOI:10.1145/1329-125.1329179.
[7] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing Atari with deep reinforcement learning. In Proc. the 27th Conference on Neural Information Processing Systems, Dec. 2013.
[8] Barto G A, Sutton S R, Anderson W C. Neuron like elements that can solve difficult learning control problems. IEEE Trans. Systems, Man, & Cybernetics, 1983, SMC-13(5):834-846. DOI:10.1109/TSMC.1983.6313077.
[9] Mnih V, Badia A P, Mirza M, Graves A, Harley T, Lillicrap T P, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In Proc. the 33rd International Conference on Machine Learning, Jun. 2016, pp.1928-1937.
[10] Lillicrap T, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. arXiv:1509.02971, 2016. http://arxiv.org/abs/1509.02971, May 2021.
[11] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv:1707.06347, 2017. https://arxiv.org/abs/1707.06347, May 2021.
[12] Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J. GA3C:GPU-based A3C for deep reinforcement learning. In Proc. the 30th Conference on Neural Information Processing Systems, Dec. 2016.
[13] Cho H, Oh P, Park J, Jung W, Lee J. FA3C:FPGAaccelerated deep reinforcement learning. In Proc. the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2019, pp.499-513. DOI:10.1145/3297858.3304058.
[14] Huang S, Papernot N, Goodfellow I, Duan Y, Abbeel P. Adversarial attacks on neural network policies. arXiv:170-2.02284, 2017. https://arxiv.org/abs/1702.02284, February 2021.
[15] Yuan Z, Gong Y. Improving the speed delivery for robotic warehouses. IFAC-PapersOnLine, 2016, 49(12):1164-1168. DOI:10.1016/j.ifacol.2016.07.661.
[16] McKee J. Speeding Fermat's factoring method. Math. Comput., 1999, 68(228):1729-1737. DOI:10.1090/S0025-5718-99-01133-3.
[17] Chinchor N. MUC-4 evaluation metrics. In Proc. the 4th Message Understanding Conference, Jun. 1992, pp.22-29. DOI:10.3115/1072064.1072067.
[18] Koutník J, Schmidhuber J, Gomez F. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In Proc. the 14th Conference on Genetic and Evolutionary Computation, Jul. 2014, pp.541-548. DOI:10.1145/2576768.2598358.
[19] Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J. Reinforcement learning through asynchronous advantage actor-critic on a GPU. arXiv:1611.06256, 2016. https://arxiv.org/abs/1611.06256, November 2020.
[20] Bojchevski A, Gunnemann S. Adversarial attacks on node embeddings via graph poisoning. arXiv:1809.01093, 2018. https://arxiv.org/abs/1809.01093, May 2021.
[21] Xiao H, Xiao H, Eckert C. Adversarial label flips attack on support vector machines. In Proc. the 20th European Conference on Artificial Intelligence, Aug. 2012, pp.870-875. DOI:10.3233/978-1-61499-098-7-870.
[22] Zugner D, Gunnemann S. Adversarial attacks on graph neural networks via meta learning. arXiv:1902.08412, 2019. https://arxiv.org/abs/1902.08412, February 2021.
[23] Goodfellow I, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. arXiv:1412.6572, 2014. https://arxiv.org/abs/1412.6572, March 2021.
[24] Kurakin A, Goodfellow I, Bengio S. Adversarial examples in the physical world. In Proc. the 5th International Conference on Learning Representations, Apr. 2017.
[25] Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R. Intriguing properties of neural networks. arXiv:1312.6199, 2013. https://arxiv.org/abs/1312.6199, February 2021.
[26] Huang Y, Zhu Q. Manipulating reinforcement learning:Poisoning attacks on cost signals. arXiv:2002.03827, 2020. https://arxiv.org/abs/2002.03827, June 2021.
[27] Tan A, Lu N, Xiao D. Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback. IEEE Transactions on Neural Networks, 2008, 19(2):230-244. DOI:10.1109/TNN.2007.905839.
[28] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, WardeFarley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In Proc. the 27th Neural Information Processing Systems, Dec. 2014, pp.2672-2680.
[29] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770-778. DOI:10.1109/CVPR.2016.90.
[30] Szegedy C, Liu W, Jia Y, Serrmanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015. DOI:10.1109/CVPR.2015.7298594.
[31] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, April 2021.
[32] Huang G, Liu Z, Van Der Maaten L Q, Weinberger K. Densely connected convolutional networks. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2261-2269. DOI:10.1109/CVPR.2017.243.
[33] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6):84-90. DOI:10.1145/3065386.
[1] Qing-Bin Liu, Shi-Zhu He, Kang Liu, Sheng-Ping Liu, Jun Zhao. A Unified Shared-Private Network with Denoising for Dialogue State Tracking [J]. Journal of Computer Science and Technology, 2021, 36(6): 1407-1419.
[2] Jia-Ke Ge, Yan-Feng Chai, Yun-Peng Chai. WATuning: A Workload-Aware Tuning System with Attention-Based Deep Reinforcement Learning [J]. Journal of Computer Science and Technology, 2021, 36(4): 741-761.
[3] Yan Zheng, Jian-Ye Hao, Zong-Zhang Zhang, Zhao-Peng Meng, Xiao-Tian Hao. Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments [J]. Journal of Computer Science and Technology, 2020, 35(2): 268-280.
[4] Lei Cui, Youyang Qu, Mohammad Reza Nosouhi, Shui Yu, Jian-Wei Niu, Gang Xie. Improving Data Utility Through Game Theory in Personalized Differential Privacy [J]. Journal of Computer Science and Technology, 2019, 34(2): 272-286.
[5] Ai-Wen Jiang, Bo Liu, Ming-Wen Wang. Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering [J]. , 2017, 32(4): 738-748.
[6] Mahsa Chitsaz, and Chaw Seng Woo, Member, IEEE. Software Agent with Reinforcement Learning Approach for Medical Image Segmentation [J]. , 2011, 26(2): 247-255.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved