From 097c4439ae74f1dd20488cc569402aacfa59eea1 Mon Sep 17 00:00:00 2001 From: vmoens Date: Tue, 3 Oct 2023 08:38:06 -0400 Subject: [PATCH] readme --- examples/rlhf/README.md | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/examples/rlhf/README.md b/examples/rlhf/README.md index 7c8347a3c1d..c4b0a261101 100644 --- a/examples/rlhf/README.md +++ b/examples/rlhf/README.md @@ -1,12 +1,15 @@ # RLHF example -This example uses RLHF (Reinforcement Learning with Human Feedback) to train a language model to summarize Reddit posts. +This example uses RLHF (Reinforcement Learning with Human Feedback) to train a +language model to summarize Reddit posts. ## Getting started -Make sure you have PyTorch 2.0 installed. You can find installation instructions [here](https://pytorch.org/get-started/locally/). +Make sure you have PyTorch>=2.0 installed. You can find installation instructions +[here](https://pytorch.org/get-started/locally/). -From this directory, you can install extra requirements for running these examples with +From this directory, you can install extra requirements for running these +examples with ```sh pip install -r requirements.txt @@ -21,16 +24,22 @@ Once the data has been prepared, you can train the GPT model. python train.py ``` -Default configuration can be found in `config/train.yaml`, and any option can be overridden with command-line arguments, for example to run the training script with a different batch size +Default configuration can be found in `config/train.yaml`, and any option can +be overridden with command-line arguments, for example to run the training +script with a different batch size: ```sh python train.py --batch_size=128 ``` -> **_NOTE:_** Apple Silicon Macbooks users make sure to use `--device=mps` and prepend all commands with `PYTORCH_ENABLE_MPS_FALLBACK=1` to enable CPU fallback +> **_NOTE:_** Apple Silicon Macbooks users make sure to use `--device=mps` +> and prepend all commands with `PYTORCH_ENABLE_MPS_FALLBACK=1` to enable CPU fallback ### Training the reward model -Once you have completed supervised fine-tuning, copy the desired model checkpoint to `./out` or update the config to point `model.name_or_path` at the relevant checkpoint in the timestamped working directory created by Hydra. You can then train the reward model with +Once you have completed supervised fine-tuning, copy the desired model +checkpoint to `./out` or update the config to point `model.name_or_path` at +the relevant checkpoint in the timestamped working directory created by Hydra. +You can then train the reward model with: ```sh python train_reward.py @@ -38,7 +47,10 @@ python train_reward.py ### Training the final model with RLHF -Once again, make sure you have either updated the configuration to point `reward_model.name_or_path` at the relevant timestamped working directory, or copy the checkpoint to `./out_reward`. You can then train the final model by running +Once again, make sure you have either updated the configuration to point +`reward_model.name_or_path` at the relevant timestamped working directory, or +copy the checkpoint to `./out_reward`. +You can then train the final model by running ```sh python train_rlhf.py