Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Yan Zheng; Jian-Ye Hao; Zong-Zhang Zhang; Zhao-Peng Meng; Xiao-Tian Hao

doi:10.1007/s11390-020-9967-6

Yan Zheng, Jian-Ye Hao, Zong-Zhang Zhang, Zhao-Peng Meng, Xiao-Tian Hao. Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments[J]. Journal of Computer Science and Technology, 2020, 35(2): 268-280. DOI: 10.1007/s11390-020-9967-6

Citation:

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Abstract

Abstract

Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents changing behavior consistently in stochastic environments. This paper extends the weighted double estimator to multiagent domains and proposes an MA-DRL framework, named Weighted Double Deep Q-Network (WDDQN). By leveraging the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also handle scenarios with raw visual inputs. To achieve efficient cooperation in multiagent domains, we introduce a lenient reward network and scheduled replay strategy. Empirical results show that WDDQN outperforms an existing DRL algorithm (double DQN) and an MA-DRL algorithm (lenient Q-learning) regarding the averaged reward and the convergence speed and is more likely to converge to the Pareto-optimal Nash equilibrium in stochastic cooperative environments.

FullText(HTML)

References (31)

Relative Articles

Supplements (2)

Cited By

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Abstract

Catalog

Export File

Citation

Format

Content