-
Ensure docker installed and ready (requires sudo), can skip if system is already capable of running nvidia containers. Example here is for Ubuntu, see NVIDIA Containers for more examples.
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base sudo apt install nvidia-container-runtime sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker
-
Build the container image:
docker build -t h2ogpt .
-
Run the container (you can also use
finetune.py
and all of its parameters as shown above for training):For the fine-tuned h2oGPT with 20 billion parameters:
docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \ -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \ --base_model=h2oai/h2ogpt-oasst1-512-20b
if have a private HF token, can instead run:
docker run --runtime=nvidia --shm-size=64g --entrypoint=bash -p 7860:7860 \ -e HUGGINGFACE_API_TOKEN=<HUGGINGFACE_API_TOKEN> \ -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it \ -c 'huggingface-cli login --token $HUGGINGFACE_API_TOKEN && python3.10 generate.py --base_model=h2oai/h2ogpt-oasst1-512-20b --use_auth_token=True'
For your own fine-tuned model starting from the gpt-neox-20b foundation model for example:
docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \ -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \ --base_model=EleutherAI/gpt-neox-20b \ --lora_weights=h2ogpt_lora_weights --prompt_type=human_bot
-
Open
https://localhost:7860
in the browser
Optional: Running with a custom entrypoint
To run with a custom entrypoint, modify the local run-gpt.sh
& mount it.
docker run \
--runtime=nvidia --shm-size=64g \
-e HF_MODEL=h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b \
-p 8888:8888 -p 7860:7860 \
--rm --init \
-v `pwd`/h2ogpt_env:/h2ogpt_env \
-v `pwd`/run-gpt.sh:/run-gpt.sh \
gcr.io/vorvan/h2oai/h2ogpt-runtime:61d6aea6fff3b1190aa42eee7fa10d6c
-
(optional) Change desired model and weights under
environment
in thedocker-compose.yml
-
Build and run the container
docker-compose up -d --build
-
Open
https://localhost:7860
in the browser -
See logs:
docker-compose logs -f
-
Clean everything up:
docker-compose down --volumes --rmi all