Containerized Installation for Inference on Linux GPU Servers

Ensure docker installed and ready (requires sudo), can skip if system is already capable of running nvidia containers. Example here is for Ubuntu, see NVIDIA Containers for more examples.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
    && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
        sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
        sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base
sudo apt install nvidia-container-runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Build the container image:
```
docker build -t h2ogpt .
```

Run the container (you can also use finetune.py and all of its parameters as shown above for training):

For the fine-tuned h2oGPT with 20 billion parameters:

docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \
    -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \
    --base_model=h2oai/h2ogpt-oasst1-512-20b

if have a private HF token, can instead run:

docker run --runtime=nvidia --shm-size=64g --entrypoint=bash -p 7860:7860 \
-e HUGGINGFACE_API_TOKEN=<HUGGINGFACE_API_TOKEN> \
-v ${HOME}/.cache:/root/.cache --rm h2ogpt -it \
 -c 'huggingface-cli login --token $HUGGINGFACE_API_TOKEN && python3.10 generate.py --base_model=h2oai/h2ogpt-oasst1-512-20b --use_auth_token=True'

For your own fine-tuned model starting from the gpt-neox-20b foundation model for example:

docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \
    -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \
    --base_model=EleutherAI/gpt-neox-20b \
    --lora_weights=h2ogpt_lora_weights --prompt_type=human_bot

Open https://localhost:7860 in the browser

Run h2oGPT using Docker

Optional: Running with a custom entrypoint

To run with a custom entrypoint, modify the local run-gpt.sh & mount it.

docker run \
    --runtime=nvidia --shm-size=64g \
    -e HF_MODEL=h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b \
    -p 8888:8888 -p 7860:7860 \
    --rm --init \
    -v `pwd`/h2ogpt_env:/h2ogpt_env \
    -v `pwd`/run-gpt.sh:/run-gpt.sh \
    gcr.io/vorvan/h2oai/h2ogpt-runtime:61d6aea6fff3b1190aa42eee7fa10d6c

Docker Compose Setup & Inference

(optional) Change desired model and weights under environment in the docker-compose.yml
Build and run the container
```
docker-compose up -d --build
```
Open https://localhost:7860 in the browser
See logs:
```
docker-compose logs -f
```
Clean everything up:
```
docker-compose down --volumes --rmi all
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTALL-DOCKER.md

INSTALL-DOCKER.md

Containerized Installation for Inference on Linux GPU Servers

Run h2oGPT using Docker

Docker Compose Setup & Inference

Files

INSTALL-DOCKER.md

Latest commit

History

INSTALL-DOCKER.md

File metadata and controls

Containerized Installation for Inference on Linux GPU Servers

Run h2oGPT using Docker

Docker Compose Setup & Inference