Releases: cpnota/autonomous-learning-library
Evaluation mode
This release contains some minor changes to several key APIs.
Agent Evaluation Mode
We added a new method to the Agent
interface called eval
. eval
is the same as act
, except the agent does not perform any training updates. This is useful for measure the performance of an agent at the end of a training run. Speaking of which...
Experiment Refactoring: Train/Test
We completely refactored the all.experiments
module. First of all, the primary public entry point is now a function called run_experiment
. Under the hood, there is a new Experiment
interface:
class Experiment(ABC):
'''An Experiment manages the basic train/test loop and logs results.'''
@abstractmethod
def frame(self):
'''The index of the current training frame.'''
@property
@abstractmethod
def episode(self):
'''The index of the current training episode'''
@abstractmethod
def train(self, frames=np.inf, episodes=np.inf):
'''
Train the agent for a certain number of frames or episodes.
If both frames and episodes are specified, then the training loop will exit
when either condition is satisfied.
Args:
frames (int): The maximum number of training frames.
episodes (bool): The maximum number of training episodes.
'''
@abstractmethod
def test(self, episodes=100):
'''
Test the agent in eval mode for a certain number of episodes.
Args:
episodes (int): The number of test epsiodes.
Returns:
list(float): A list of all returns received during testing.
'''
Notice the new method, experiment.test()
. This method runs the agent in eval
mode for a certain number of episodes and logs summary statistics (the mean and std of the returns).
Approximation: no_grad vs. eval
Finally, we clarified the usage of Approximation.eval(*inputs)
by adding an additional method, Approximation.no_grad(*inputs)
. eval()
both puts the network in evaluation mode and runs the forward pass with torch.no_grad()
. no_grad()
simply runs a forward pass in the current mode. The various Policy
implementations were also adjusted to correctly execute the greedy behavior in eval
mode.
0.4.0
Plots
Small but important update!
- Added
all.experiments.plot
module, withplot_returns_100
function that accepts aruns
directory and plots contained results. - Tweaked the
a2c
Atari preset to match the configuration of the other algorithms better
C51
Unification
This release contains several usability enhancements! The biggest change, however, is a refactor. The policy classes now extend from Approximation
. This means that things like target networks, learning rate schedulers, and model saving is all handled in one place!
This full list of changes is:
- Refactored experiment API (#88)
- Policies inherit from
Approximation
(#89) - Models now save themselves automatically every 200 updates. Also, you can load models and watch them play in each environment! (#90)
- Automatically set the temperature in SAC (#91)
- Schedule learning rates and other parameters (#92)
- SAC bugfix
- Refactor usage of target networks. Now there is a difference between
eval()
andtarget()
: the former runs a forward pass of the current network, the latter does so on the target network, each without creating a computation graph. (#94) - Tweak
AdvantageBuffer
API. Also fix a minor bug in A2C (#95) - Report the best returns so far in separate metric (#96)
SAC Hotfix
A bunch in SoftDeterministicPolicy was slowing learning and causing numerical instability in some cases. This fixes that.
SAC
Added Soft-Actor Critic (SAC). SAC is a state-of-the-art algorithm for continuous control based on the max-entropy RL framework.
PPO + Vanilla
PPO
and Vanilla
release!
- Add PPO, one of the most popular modern RL algorithms.
- Add
Vanilla
series agents: "vanilla" implementations of actor-critic, sarsa, q-learning, and REINFORCE. These algorithms are all prefixed with the letter "v" in theagents
folder.
DDPG
This release introduces continuous
policies and agents, including DDPG
. Also includes a number of quality-of-life improvements:
- Add
continuous
agent suite - Add
Gaussian
policy - Add
DeterministicPolicy
- Introduce
Approximation
base class from whichQNetwork
,VNetwork
, etc. are derived - Convert
layers
module toall.nn
. Extend fromtorch.nn
with custom layers added, to make crafting unique networks easier. - Introduce
DDPG
agent
act
The release contains a bunch of changes under the hood. The agent
API was simplified down to a single method, action = agent.act(state, reward)
. The accompany this change, State
was added as a first class object. Terminal states now have the state.mask
set to 0, whereas before terminal states were represented by None
.
Another major addition is slurm
support. This is in particular to aid in running on gypsum
. The SlurmExperiment
API handles the creation of the appropriate .sh
files, output, etc., so experiments can be run on slurm
by writing a single python script! No more writing .sh
files by hand! Examples can be found in the demos
folder.
There were a few other minor changes as well.
Change log:
- Simplified agent API to only include
act
#56 - Added State object #51
- Added SlurmExperiment for running on gypsum #53
- Updated the local and release scripts, and added slurm demos #54
- Tweaked parameter order in replay buffers #59
- Improved shared feature handling #63
- Made
write_loss
togglable #64 - Tweaked default hyperparameters