Advanced search
2 files | 5.12 MB
Author
Organization
Abstract
Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results. Detailed analysis reveals important aspects of a good hash function: 1) having appropriate granularity and 2) encoding information relevant to solving the MDP. This exploration strategy achieves near state-of-the-art performance on both continuous control tasks and Atari 2600 games, hence providing a simple yet powerful baseline for solving MDPs that require considerable exploration.

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 2.59 MB
  • 7357 i.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 2.53 MB

Citation

Please use this url to cite or link to this publication:

Chicago
Tang, Haoran, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel. 2017. “#Exploration : a Study of Count-based Exploration for Deep Reinforcement Learning.” In ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 30:1–18. 2017.
APA
Tang, Haoran, Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., et al. (2017). #Exploration : a study of count-based exploration for deep reinforcement learning. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017) (Vol. 30, pp. 1–18). Presented at the 31st Conference on Neural Information Processing Systems (NIPS), 2017.
Vancouver
1.
Tang H, Houthooft R, Foote D, Stooke A, Chen X, Duan Y, et al. #Exploration : a study of count-based exploration for deep reinforcement learning. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017). 2017; 2017. p. 1–18.
MLA
Tang, Haoran, Rein Houthooft, Davis Foote, et al. “#Exploration : a Study of Count-based Exploration for Deep Reinforcement Learning.” ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017). Vol. 30. 2017, 2017. 1–18. Print.
@inproceedings{8588323,
  abstract     = {Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results. Detailed analysis reveals important aspects of a good hash function: 1) having appropriate granularity and 2) encoding information relevant to solving the MDP. This exploration strategy achieves near state-of-the-art performance on both continuous control tasks and Atari 2600 games, hence providing a simple yet powerful baseline for solving MDPs that require considerable exploration.},
  author       = {Tang, Haoran and Houthooft, Rein and Foote, Davis and Stooke, Adam and Chen, Xi and Duan, Yan and Schulman, John and De Turck, Filip and Abbeel, Pieter},
  booktitle    = {ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017)},
  issn         = {1049-5258},
  language     = {eng},
  location     = {Long Beach, CA},
  pages        = {1--18},
  title        = {\#Exploration : a study of count-based exploration for deep reinforcement learning},
  volume       = {30},
  year         = {2017},
}

Web of Science
Times cited: