Skip to content

Latest commit

 

History

History
205 lines (103 loc) · 11.2 KB

README.md

File metadata and controls

205 lines (103 loc) · 11.2 KB

ICLR2019-RL-Papers

The Reinforcement-Learning-Related Papers of ICLR 2019

Follows are accepted papers:

[1] Temporal Difference Variational Auto-Encoder Chinese Paper Note

[2] Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow Chinese Paper Note

[3] Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

[4] Composing Complex Skills by Learning Transition Policies with Proximity Reward Induction

[5] Exploration by random network distillation

[6] Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning

[7] Learning to Navigate the Web

[8] Variance Reduction for Reinforcement Learning in Input-Driven Environments

[9] ProMP: Proximal Meta-Policy Search

[10] Learning Self-Imitating Diverse Policies

[11] Recurrent Experience Replay in Distributed Reinforcement Learning

[12] Large-Scale Study of Curiosity-Driven Learning

[13] Diversity is All You Need: Learning Skills without a Reward Function

[14] Learning to Schedule Communication in Multi-agent Reinforcement Learning

[15] Episodic Curiosity through Reachability

[16] Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

[17] Knowledge Flow: Improve Upon Your Teachers

[18] Supervised Policy Update for Deep Reinforcement Learning

[19] DARTS: Differentiable Architecture Search

[20] Deep Online Learning Via Meta-Learning: Continual Adaptation for Model-Based RL

[21] Information-Directed Exploration for Deep Reinforcement Learning

[22] Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

[23] Solving the Rubik's Cube with Approximate Policy Iteration

[24] Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference

[25] Hindsight policy gradients

[26] Optimal Control Via Neural Networks: A Convex Approach

[27] NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning

[28] CEM-RL: Combining evolutionary and gradient-based methods for policy search

[29] Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications

[30] Policy Transfer with Strategy Optimization

[31] Unsupervised Control Through Non-Parametric Discriminative Rewards

[32] Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information

[33] Emergent Coordination Through Competition

[34] Learning to Understand Goal Specifications by Modelling Reward

[35] Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy

[36] Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

[37] Optimal Completion Distillation for Sequence Learning

[38] SNAS: stochastic neural architecture search

[39] GO Gradient for Expectation-Based Objectives

[40] Analyzing Inverse Problems with Invertible Neural Networks

[41] Deep reinforcement learning with relational inductive biases

[42] Attention, Learn to Solve Routing Problems!

[43] Recall Traces: Backtracking Models for Efficient Reinforcement Learning

[44] DOM-Q-NET: Grounded RL on Structured Language

[45] Graph HyperNetworks for Neural Architecture Search

[46] Value Propagation Networks

[47] Contingency-Aware Exploration in Reinforcement Learning

[48] Learning Finite State Representations of Recurrent Policy Networks

[49] Initialized Equilibrium Propagation for Backprop-Free Training

[50] Learning to Design RNA

[51] Stable Opponent Shaping in Differentiable Games

[52] Relational Forward Models for Multi-Agent Learning

[53] Preferences Implicit in the State of the World

[54] Remember and Forget for Experience Replay

[55] Reward Constrained Policy Optimization

[56] Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards

[57] Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees

[58] Learning To Simulate

[59] DHER: Hindsight Experience Replay for Dynamic Goals

[60] Neural Graph Evolution: Automatic Robot Design

[61] Hierarchical Visuomotor Control of Humanoids

[62] Information asymmetry in KL-regularized RL

[63] From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

[64] Soft Q-Learning with Mutual-Information Regularization

[65] M^3RL: Mind-aware Multi-agent Management Reinforcement Learning

[66] Modeling the Long Term Future in Model-Based Reinforcement Learning

[67] Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering

[68] Probabilistic Planning with Sequential Monte Carlo methods

[69] Learning what you can do before doing anything

[70] Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

[71] Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

[72] A new dog learns old tricks: RL finds classic optimization algorithms

[73] Generating Multi-Agent Trajectories using Programmatic Weak Supervision

[74] Competitive experience replay

[75] Bayesian Policy Optimization for Model Uncertainty

[76] Environment Probing Interaction Policies

[77] Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

[78] Learning Multi-Level Hierarchies with Hindsight

[79] Generative predecessor models for sample-efficient imitation learning

[80] How to train your MAML

[81] Adversarial Imitation via Variational Inverse Reinforcement Learning

[82] Variance Networks: When Expectation Does Not Meet Your Expectations

[83] Success at any cost: value constrained model-free continuous control

[84] Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization

[85] Stochastic Prediction of Multi-Agent Interactions from Partial Observations

Follows papers are rejected:

[1] Policy Generalization In Capacity-Limited Reinforcement Learning

[2] EMI: Exploration with Mutual Information Maximizing State and Action Embeddings

[3] Lyapunov-based Safe Policy Optimization

[4] Towards Consistent Performance on Atari using Expert Demonstrations

[5] TarMAC: Targeted Multi-Agent Communication

[6] Neural MMO: A massively multiplayer game environment for intelligent agents

[7] Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation

[8] Uncovering Surprising Behaviors in Reinforcement Learning via Worst-case Analysis

[9] Reinforcement Learning with Perturbed Rewards

[10] On-Policy Trust Region Policy Optimisation with Replay Buffers

[11] Interactive Agent Modeling by Learning to Probe

[12] Learning Heuristics for Automated Reasoning through Reinforcement Learning

[13] Deep Imitative Models for Flexible Inference, Planning, and Control

[TBW]