We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Tong Chen, Ji-Qiang Liu, He Li, Shuo-Ru Wang, Wen-Jia Niu, En-Dong Tong, Liang Chang, Qi Alfred Chen, Gang Li. Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View[J]. Journal of Computer Science and Technology, 2021, 36(5): 1002-1021. DOI: 10.1007/s11390-021-1217-z
Citation: Tong Chen, Ji-Qiang Liu, He Li, Shuo-Ru Wang, Wen-Jia Niu, En-Dong Tong, Liang Chang, Qi Alfred Chen, Gang Li. Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View[J]. Journal of Computer Science and Technology, 2021, 36(5): 1002-1021. DOI: 10.1007/s11390-021-1217-z

Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View

Funds: The work was supported by the National Natural Science Foundation of China under Grant Nos. 61972025, 61802389, 61672092, U1811264, and 61966009, the National Key Research and Development Program of China under Grant Nos. 2020YFB1005604 and 2020YFB2103802, and Guangxi Key Laboratory of Trusted Software under Grant No. KX201902.
More Information
  • Author Bio:

    Tong Chen received her M.S. degree in cyber security from Beijing Jiaotong University, Beijing, in 2018. She is currently a Ph.D. candidate of cyber security in Beijing Jiaotong University, Beijing. Her main research interests are cyber security and reinforcement learning security.

  • Corresponding author:

    Wen-Jia Niu E-mail: niuwj@bjtu.edu.cn

    En-Dong Tong E-mail: edtong@bjtu.edu.cn

  • Received Date: December 11, 2020
  • Revised Date: July 25, 2021
  • Published Date: September 29, 2021
  • Reinforcement learning as autonomous learning is greatly driving artificial intelligence (AI) development to practical applications. Having demonstrated the potential to significantly improve synchronously parallel learning, the parallel computing based asynchronous advantage actor-critic (A3C) opens a new door for reinforcement learning. Unfortunately, the acceleration's influence on A3C robustness has been largely overlooked. In this paper, we perform the first robustness assessment of A3C based on parallel computing. By perceiving the policy's action, we construct a global matrix of action probability deviation and define two novel measures of skewness and sparseness to form an integral robustness measure. Based on such static assessment, we then develop a dynamic robustness assessing algorithm through situational whole-space state sampling of changing episodes. Extensive experiments with different combinations of agent number and learning rate are implemented on an A3C-based pathfinding application, demonstrating that our proposed robustness assessment can effectively measure the robustness of A3C, which can achieve an accuracy of 83.3%.
  • [1]
    Fabisch A, Petzoldt C, Otto M, Kirchner F. A survey of behavior learning applications in robotics-State of the art and perspectives. arXiv:1906.01868, 2019. https://arxiv.org/abs/1906.01868, June 2021.
    [2]
    Silver D, Huang A, Maddison C J et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587):484-489. DOI: 10.1038/nature16961.
    [3]
    Mnih V, Kavukcuoglu K, Silver D et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529-533. DOI: 10.1038/nature14236.
    [4]
    Tamar A, Wu Y, Thomas G, Levine S, Abbeel P. Value iteration networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.2154-2162.
    [5]
    Watkins C. Learning from delayed rewards[Ph.D. Thesis]. University of Cambridge, England, 1989.
    [6]
    Grounds M, Kudenko D. Parallel reinforcement learning with linear function approximation. In Proc. the 6th European Conference on Adaptive and Learning Agents and Multiagent Systems:Adaptation and Multi-Agent Learning, May 2007, Article No. 45. DOI: 10.1145/1329-125.1329179.
    [7]
    Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing Atari with deep reinforcement learning. In Proc. the 27th Conference on Neural Information Processing Systems, Dec. 2013.
    [8]
    Barto G A, Sutton S R, Anderson W C. Neuron like elements that can solve difficult learning control problems. IEEE Trans. Systems, Man, & Cybernetics, 1983, SMC-13(5):834-846. DOI: 10.1109/TSMC.1983.6313077.
    [9]
    Mnih V, Badia A P, Mirza M, Graves A, Harley T, Lillicrap T P, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In Proc. the 33rd International Conference on Machine Learning, Jun. 2016, pp.1928-1937.
    [10]
    Lillicrap T, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. arXiv:1509.02971, 2016. http://arxiv.org/abs/1509.02971, May 2021.
    [11]
    Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv:1707.06347, 2017. https://arxiv.org/abs/1707.06347, May 2021.
    [12]
    Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J. GA3C:GPU-based A3C for deep reinforcement learning. In Proc. the 30th Conference on Neural Information Processing Systems, Dec. 2016.
    [13]
    Cho H, Oh P, Park J, Jung W, Lee J. FA3C:FPGAaccelerated deep reinforcement learning. In Proc. the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2019, pp.499-513. DOI: 10.1145/3297858.3304058.
    [14]
    Huang S, Papernot N, Goodfellow I, Duan Y, Abbeel P. Adversarial attacks on neural network policies. arXiv:170-2.02284, 2017. https://arxiv.org/abs/1702.02284, February 2021.
    [15]
    Yuan Z, Gong Y. Improving the speed delivery for robotic warehouses. IFAC-PapersOnLine, 2016, 49(12):1164-1168. DOI: 10.1016/j.ifacol.2016.07.661.
    [16]
    McKee J. Speeding Fermat's factoring method. Math. Comput., 1999, 68(228):1729-1737. DOI: 10.1090/S0025-5718-99-01133-3.
    [17]
    Chinchor N. MUC-4 evaluation metrics. In Proc. the 4th Message Understanding Conference, Jun. 1992, pp.22-29. DOI: 10.3115/1072064.1072067.
    [18]
    Koutník J, Schmidhuber J, Gomez F. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In Proc. the 14th Conference on Genetic and Evolutionary Computation, Jul. 2014, pp.541-548. DOI: 10.1145/2576768.2598358.
    [19]
    Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J. Reinforcement learning through asynchronous advantage actor-critic on a GPU. arXiv:1611.06256, 2016. https://arxiv.org/abs/1611.06256, November 2020.
    [20]
    Bojchevski A, Gunnemann S. Adversarial attacks on node embeddings via graph poisoning. arXiv:1809.01093, 2018. https://arxiv.org/abs/1809.01093, May 2021.
    [21]
    Xiao H, Xiao H, Eckert C. Adversarial label flips attack on support vector machines. In Proc. the 20th European Conference on Artificial Intelligence, Aug. 2012, pp.870-875. DOI: 10.3233/978-1-61499-098-7-870.
    [22]
    Zugner D, Gunnemann S. Adversarial attacks on graph neural networks via meta learning. arXiv:1902.08412, 2019. https://arxiv.org/abs/1902.08412, February 2021.
    [23]
    Goodfellow I, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. arXiv:1412.6572, 2014. https://arxiv.org/abs/1412.6572, March 2021.
    [24]
    Kurakin A, Goodfellow I, Bengio S. Adversarial examples in the physical world. In Proc. the 5th International Conference on Learning Representations, Apr. 2017.
    [25]
    Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R. Intriguing properties of neural networks. arXiv:1312.6199, 2013. https://arxiv.org/abs/1312.6199, February 2021.
    [26]
    Huang Y, Zhu Q. Manipulating reinforcement learning:Poisoning attacks on cost signals. arXiv:2002.03827, 2020. https://arxiv.org/abs/2002.03827, June 2021.
    [27]
    Tan A, Lu N, Xiao D. Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback. IEEE Transactions on Neural Networks, 2008, 19(2):230-244. DOI: 10.1109/TNN.2007.905839.
    [28]
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, WardeFarley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In Proc. the 27th Neural Information Processing Systems, Dec. 2014, pp.2672-2680.
    [29]
    He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90.
    [30]
    Szegedy C, Liu W, Jia Y, Serrmanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015. DOI: 10.1109/CVPR.2015.7298594.
    [31]
    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, April 2021.
    [32]
    Huang G, Liu Z, Van Der Maaten L Q, Weinberger K. Densely connected convolutional networks. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2261-2269. DOI: 10.1109/CVPR.2017.243.
    [33]
    Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6):84-90. DOI: 10.1145/3065386.
  • Related Articles

    [1]Chang-Ai Sun, Ming-Jun Xiao, He-Peng Dai, Huai Liu. A Reinforcement Learning Based Approach to Partition Testing[J]. Journal of Computer Science and Technology, 2025, 40(1): 99-118. DOI: 10.1007/s11390-024-2900-7
    [2]Zhong Qian, Pei-Feng Li, Qiao-Ming Zhu, Guo-Dong Zhou. Document-Level Event Factuality Identification via Reinforced Semantic Learning Network[J]. Journal of Computer Science and Technology, 2024, 39(6): 1248-1268. DOI: 10.1007/s11390-024-2655-1
    [3]Qing-Bin Liu, Shi-Zhu He, Cao Liu, Kang Liu, Jun Zhao. Unsupervised Dialogue State Tracking for End-to-End Task-Oriented Dialogue with a Multi-Span Prediction Network[J]. Journal of Computer Science and Technology, 2023, 38(4): 834-852. DOI: 10.1007/s11390-021-1064-y
    [4]Tong Ding, Ning Liu, Zhong-Min Yan, Lei Liu, Li-Zhen Cui. An Efficient Reinforcement Learning Game Framework for UAV-Enabled Wireless Sensor Network Data Collection[J]. Journal of Computer Science and Technology, 2022, 37(6): 1356-1368. DOI: 10.1007/s11390-022-2419-8
    [5]Tian-Yu Zhao, Man Zeng, Jian-Hua Feng. An Exercise Collection Auto-Assembling Framework with Knowledge Tracing and Reinforcement Learning[J]. Journal of Computer Science and Technology, 2022, 37(5): 1105-1117. DOI: 10.1007/s11390-022-2412-2
    [6]Qing-Bin Liu, Shi-Zhu He, Kang Liu, Sheng-Ping Liu, Jun Zhao. A Unified Shared-Private Network with Denoising for Dialogue State Tracking[J]. Journal of Computer Science and Technology, 2021, 36(6): 1407-1419. DOI: 10.1007/s11390-020-0338-0
    [7]Jia-Ke Ge, Yan-Feng Chai, Yun-Peng Chai. WATuning: A Workload-Aware Tuning System with Attention-Based Deep Reinforcement Learning[J]. Journal of Computer Science and Technology, 2021, 36(4): 741-761. DOI: 10.1007/s11390-021-1350-8
    [8]Lei Cui, Youyang Qu, Mohammad Reza Nosouhi, Shui Yu, Jian-Wei Niu, Gang Xie. Improving Data Utility Through Game Theory in Personalized Differential Privacy[J]. Journal of Computer Science and Technology, 2019, 34(2): 272-286. DOI: 10.1007/s11390-019-1910-3
    [9]Yao Shu, Zhang Bo. Situated Learning of a Behavior-Based Mobile Robot Path Planner[J]. Journal of Computer Science and Technology, 1995, 10(4): 375-379.
    [10]Ma Zhifang. DKBLM——Deep Knowledge Based Learning Methodology[J]. Journal of Computer Science and Technology, 1993, 8(4): 93-98.
  • Others

  • Cited by

    Periodical cited type(7)

    1. Lei Gao, Xuechao Wang. Intelligent Control of the Air Compressor (AC) and Back Pressure Valve (BPV) to Improve PEMFC System Dynamic Response and Efficiency in High Altitude Regions. Eng, 2025, 6(1): 19. DOI:10.3390/eng6010019
    2. Recep Ozalp, Aysegul Ucar, Cuneyt Guzelis. Advancements in Deep Reinforcement Learning and Inverse Reinforcement Learning for Robotic Manipulation: Toward Trustworthy, Interpretable, and Explainable Artificial Intelligence. IEEE Access, 2024, 12: 51840. DOI:10.1109/ACCESS.2024.3385426
    3. Hongjian Wang, Wei Gao, Zhao Wang, et al. Research on Obstacle Avoidance Planning for UUV Based on A3C Algorithm. Journal of Marine Science and Engineering, 2023, 12(1): 63. DOI:10.3390/jmse12010063
    4. Jingru Chang, Dong Yu, Yi Hu, et al. Deep Reinforcement Learning for Dynamic Flexible Job Shop Scheduling with Random Job Arrival. Processes, 2022, 10(4): 760. DOI:10.3390/pr10040760
    5. Hui Wang, Yifeng Wang, Yuanbo Guo. Unknown network attack detection method based on reinforcement zero-shot learning. Journal of Physics: Conference Series, 2022, 2303(1): 012008. DOI:10.1088/1742-6596/2303/1/012008
    6. Jingru Chang, Dong Yu, Zheng Zhou, et al. Hierarchical Reinforcement Learning for Multi-Objective Real-Time Flexible Scheduling in a Smart Shop Floor. Machines, 2022, 10(12): 1195. DOI:10.3390/machines10121195
    7. Mikko Kiviharju. Artificial Intelligence for Security. DOI:10.1007/978-3-031-57452-8_9

    Other cited types(0)

Catalog

    Article views (77) PDF downloads (0) Cited by(7)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return