From 097c4439ae74f1dd20488cc569402aacfa59eea1 Mon Sep 17 00:00:00 2001
From: vmoens <vincentmoens@gmail.com>
Date: Tue, 3 Oct 2023 08:38:06 -0400
Subject: [PATCH] readme

---
 examples/rlhf/README.md | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/examples/rlhf/README.md b/examples/rlhf/README.md
index 7c8347a3c1d..c4b0a261101 100644
--- a/examples/rlhf/README.md
+++ b/examples/rlhf/README.md
@@ -1,12 +1,15 @@
 # RLHF example
 
-This example uses RLHF (Reinforcement Learning with Human Feedback) to train a language model to summarize Reddit posts.
+This example uses RLHF (Reinforcement Learning with Human Feedback) to train a 
+language model to summarize Reddit posts.
 
 ## Getting started
 
-Make sure you have PyTorch 2.0 installed. You can find installation instructions [here](https://pytorch.org/get-started/locally/).
+Make sure you have PyTorch>=2.0 installed. You can find installation instructions
+[here](https://pytorch.org/get-started/locally/).
 
-From this directory, you can install extra requirements for running these examples with
+From this directory, you can install extra requirements for running these
+examples with
 
 ```sh
 pip install -r requirements.txt
@@ -21,16 +24,22 @@ Once the data has been prepared, you can train the GPT model.
 python train.py
 ```
 
-Default configuration can be found in `config/train.yaml`, and any option can be overridden with command-line arguments, for example to run the training script with a different batch size
+Default configuration can be found in `config/train.yaml`, and any option can
+be overridden with command-line arguments, for example to run the training
+script with a different batch size:
 
 ```sh
 python train.py --batch_size=128
 ```
-> **_NOTE:_**  Apple Silicon Macbooks users make sure to use `--device=mps` and prepend all commands with `PYTORCH_ENABLE_MPS_FALLBACK=1` to enable CPU fallback
+> **_NOTE:_**  Apple Silicon Macbooks users make sure to use `--device=mps`
+> and prepend all commands with `PYTORCH_ENABLE_MPS_FALLBACK=1` to enable CPU fallback
 
 ### Training the reward model
 
-Once you have completed supervised fine-tuning, copy the desired model checkpoint to `./out` or update the config to point `model.name_or_path` at the relevant checkpoint in the timestamped working directory created by Hydra. You can then train the reward model with
+Once you have completed supervised fine-tuning, copy the desired model
+checkpoint to `./out` or update the config to point `model.name_or_path` at
+the relevant checkpoint in the timestamped working directory created by Hydra.
+You can then train the reward model with:
 
 ```sh
 python train_reward.py
@@ -38,7 +47,10 @@ python train_reward.py
 
 ### Training the final model with RLHF
 
-Once again, make sure you have either updated the configuration to point `reward_model.name_or_path` at the relevant timestamped working directory, or copy the checkpoint to `./out_reward`. You can then train the final model by running
+Once again, make sure you have either updated the configuration to point
+`reward_model.name_or_path` at the relevant timestamped working directory, or
+copy the checkpoint to `./out_reward`.
+You can then train the final model by running
 
 ```sh
 python train_rlhf.py