Key Words: Reinforcement Learning

Taxonomy

Comments

  • Theoretic track contains work from Nature and NeuraIPS
  • Possible variations:
    • Multi-Agent RL:
      • Challenges: Generating diversified behaviors for each individual agent when receiving only a team reward.
    • Constrained RL: for problems with conflicting requirements.[NeuraIPS-2019-Constrained, NeuraIPS-2019-Convex-Constraints]
    • Hierarchical RL: for long-horizon problems with sparse and delayed rewards.
    • Multi-Objective RL: for problem with multiple competing objectives whose relative importance is unknown.
      • Strengths: Alleviates dependence on scalar reward design.
      • Weaknesses: The epected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions
    • Multi-View RL: for agents share common dynamics but adhere to different observation models.
    • Network pruning (in case of long train time)
    • Model transfer
    • Adaptive time step length
  • Problem features and corrpesonding solutions:
    • Multiple objectives: as

Individual Papers

  • Human-level control through deep reinforcement learning, Nature 2015 DQN
    • Taxonomy: Game play, Single-agent, DQN
    • Contributions: DQN.
  • Common Knowledge Reinforcement Learning, NeuraIPS 2019 Common Knowledge
    • Contributions: A hierarchical policy tree for different groups of agents with condition common knowledge.
  • Constrained Reinforcement Learning Has Zero Duality Gap, NeuraIPS 2019 Constrained
    • Contributions: Theoretical support for solving constrained RL with primal-dual methods.
  • Generalized Algorithm for Multi-Objective RL, NeurIPS-2019-Multi-Objective
    • Contributions: A generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences.
  • Hierarchical RL with Advantage-Based Auxiliary Rewards, NeuraIPS 2019 Hierarchical
    • Contributions: Auxiliary rewards for low-level skill training based on the advantage function of the high-level policy in HRL.
  • Interval timing in deep reinforcement learning agents
    • Contributions: Understanding how RL represents time.
  • Multi-View Reinforcement Learning, NeuraIPS 2019 Multi-View
    • Contributions: Agents share common dynamics but adhere to different observation models.
  • Regret Minimization for RL with Vectorial Feedback and Complex Objectives, NeuraIPS 2019 Regret
    • Contributions: Regret Minimization for RL with Vectorial Feedback (Reward)
  • RL with Convex Constraints: NeuraIPS 2019 Convex Constraints
    • Contributions: An algorithmic scheme that can handle a wide class of constraints in RL tasks.
  • Structured Prediction Approach for Generalization in Cooperative Multi-Agent RL, NeuraIPS 2019 Structured
    • Opportunity: Intrinsic notion of locality (e.g., geographic proximity) such that interactions between agents and tasks are locally limited in multi-agent RL.
    • Contributions: A novel structured prediction approach to assign agents to tasks.
    • Applications: Game play.

Ref.

[Nature-2015-DQN] Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. “Human-level control through deep reinforcement learning.” nature 518, no. 7540 (2015): 529-533.

[NeuraIPS-2019-Common-Knowledge] de Witt, Christian Schroeder, Jakob Foerster, Gregory Farquhar, Philip Torr, Wendelin Boehmer, and Shimon Whiteson. “Multi-agent common knowledge reinforcement learning.” In Advances in Neural Information Processing Systems, pp. 9927-9939. 2019.

[NeuraIPS-2019-Constrained] Paternain, Santiago, Luiz Chamon, Miguel Calvo-Fullana, and Alejandro Ribeiro. “Constrained reinforcement learning has zero duality gap.” In Advances in Neural Information Processing Systems, pp. 7555-7565. 2019.

[NeuraIPS-2019-Convex-Constraints] Miryoosefi, Sobhan, Kianté Brantley, Hal Daume III, Miro Dudik, and Robert E. Schapire. “Reinforcement learning with convex constraints.” In Advances in Neural Information Processing Systems, pp. 14093-14102. 2019.

[NeuraIPS-2019-Hierarchical] Li, Siyuan, Rui Wang, Minxue Tang, and Chongjie Zhang. “Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards.” In Advances in Neural Information Processing Systems, pp. 1409-1419. 2019.

[NeurIPS-2019-Multi-Objective] Yang, Runzhe, Xingyuan Sun, and Karthik Narasimhan. “A generalized algorithm for multi-objective reinforcement learning and policy adaptation.” In Advances in Neural Information Processing Systems, pp. 14636-14647. 2019.

[NeuraIPS-2019-Multi-View] Li, Minne, Lisheng Wu, W. A. N. G. Jun, and Haitham Bou Ammar. “Multi-view reinforcement learning.” In Advances in neural information processing systems, pp. 1420-1431. 2019.

[NeuraIPS-2019-Regret] Cheung, Wang Chi. “Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives.” In Advances in Neural Information Processing Systems, pp. 726-736. 2019.

[NeuraIPS-2019-Structured] Carion, Nicolas, Nicolas Usunier, Gabriel Synnaeve, and Alessandro Lazaric. “A structured prediction approach for generalization in cooperative multi-agent reinforcement learning.” In Advances in Neural Information Processing Systems, pp. 8130-8140. 2019.

[NeruaIPS-2019-Timing] Deverett, Ben, Ryan Faulkner, Meire Fortunato, Gregory Wayne, and Joel Z. Leibo. “Interval timing in deep reinforcement learning agents.” In Advances in Neural Information Processing Systems, pp. 6689-6698. 2019.