nllb-api support GPU #49

online2311 · 2023-10-10T08:29:36Z

Is it possible to support GPU? I found out that CTranslate2 supports GPU. Does it support GPU to respond faster?

winstxnhdw · 2023-10-10T10:08:28Z

Yes, CTranslate2 supports GPU. It is indeed magnitudes faster and it's really easy to implement it yourself. I don't have the time to add it right now, and will only introduce it in a month or two. For now, you can do it by yourself by changing the following code and installing the NVIDIA Container Toolkit.

cls.translator = CTranslator(model_path, device='cuda', compute_type='auto', device_index=[0, 1])

Afterwards, just run the container with the NVIDIA runtime.

docker build -f Dockerfile.build -t nllb-api .
docker run --rm  --runtime=nvidia --gpus all \
  -e SERVER_PORT=5000 \
  -e APP_PORT=7860 \
  -e OMP_NUM_THREADS=6 \
  -e WORKER_COUNT=1 \
  -p 7860:7860 \
  -v ./cache:/home/user/.cache \
  nllb-api

online2311 · 2023-10-12T02:27:09Z

Awesome, thanks. I will try.

online2311 · 2023-10-12T04:29:30Z

Is there any problem with this? I don’t seem to see GPU usage


==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

2023-10-12 04:27:23,976 INFO supervisord started with pid 1
2023-10-12 04:27:24,979 INFO spawned: 'server' with pid 27
2023-10-12 04:27:24,980 INFO spawned: 'caddy' with pid 28
{"level":"info","ts":1697084845.0245798,"msg":"using provided configuration","config_file":"Caddyfile","config_adapter":"caddyfile"}
{"level":"info","ts":1697084845.0266504,"logger":"admin","msg":"admin endpoint started","address":"localhost:2019","enforce_origin":false,"origins":["//localhost:2019","//[::1]:2019","//127.0.0.1:2019"]}
{"level":"info","ts":1697084845.0269077,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc00036d680"}
{"level":"info","ts":1697084845.0271118,"logger":"http.log","msg":"server running","name":"srv0","protocols":["h1","h2","h3"]}
{"level":"info","ts":1697084845.0271096,"logger":"tls","msg":"cleaning storage unit","description":"FileStorage:/home/user/.local/share/caddy"}
{"level":"error","ts":1697084845.027182,"msg":"unable to create folder for config autosave","dir":"/home/user/.config/caddy","error":"mkdir /home/user/.config: permission denied"}
{"level":"info","ts":1697084845.027194,"msg":"serving initial configuration"}
{"level":"info","ts":1697084845.0271883,"logger":"tls","msg":"finished cleaning storage units"}
 * /v1/cpu route found!
 * /v1/index route found!
 * /v1/translate route found!
 * /v2/index route found!
 * /v2/translate route found!
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 40372.98it/s]
2023-10-12 04:27:35,816 INFO success: server entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2023-10-12 04:27:35,816 INFO success: caddy entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
[2023-10-12 04:27:39 +0000] [52] [INFO] Running on http://0.0.0.0:5000 (CTRL + C to quit)
2023-10-12 04:27:55.933192: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-12 04:27:55.933287: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-12 04:27:55.933397: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-12 04:27:57.223742: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-10-12 04:27:58 +0000] [52] [INFO] 200 "POST /v2/translate 2" 127.0.0.1:33352 "curl/8.1.2"

online2311 · 2023-10-12T07:46:41Z

Whether "WORKER_COUNT=2" supports running on two GPUs separately.

winstxnhdw · 2023-10-12T09:02:05Z

Whoops, this was what I was afraid of. You will have to also replace the final image in Dockerfile.build to nvidia/cuda.

FROM python:3.11.6-slim as python-builder

ENV POETRY_VIRTUALENVS_CREATE false
ENV POETRY_HOME /opt/poetry
ENV PATH $POETRY_HOME/bin:$PATH

WORKDIR /

COPY pyproject.toml .

RUN apt update
RUN apt install -y curl
RUN curl -sSL https://install.python-poetry.org | python -
RUN poetry install --no-dev


FROM caddy:builder-alpine as caddy-builder

RUN xcaddy build --with github.com/caddyserver/cache-handler


FROM nvidia/cuda:11.7.1-runtime-ubuntu22.04

ENV HOME /home/user
ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1

RUN useradd -m -u 1000 user

USER user

WORKDIR $HOME/app

COPY --chown=user --from=caddy-builder  /usr/bin/caddy /usr/bin/caddy
COPY --chown=user --from=python-builder /usr/local/    /usr/local/
COPY --chown=user . $HOME/app

CMD ["supervisord"]

WORKER_COUNT is for CPU threads only. If you want to use multiple GPUs, you'll have to change the add the device_index argument.

cls.translator = CTranslator(model_path, device='cuda', compute_type='auto', device_index=[0, 1])

You can read more here.

online2311 · 2023-10-13T00:27:19Z

This is the repaired Dockerfile and it now runs normally.

FROM caddy:builder-alpine as caddy-builder

RUN xcaddy build --with github.com/caddyserver/cache-handler


FROM nvidia/cuda:11.8.0-devel-ubuntu22.04
ENV POETRY_VIRTUALENVS_CREATE false
ENV POETRY_HOME /opt/poetry
ENV PATH $POETRY_HOME/bin:$PATH

WORKDIR /
RUN apt-get update && \
    apt-get install --no-install-recommends -y git curl vim python3-dev python3-pip && \
    rm -rf /var/lib/apt/lists/*

ENV HOME /home/user
ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1

RUN pip3 install torch torchvision torchaudio tensorflow ctranslate2 tensorrt
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN curl -sSL https://install.python-poetry.org | python -
COPY pyproject.toml .
RUN poetry install --no-dev
RUN useradd -m -u 1000 user
RUN chown -R user:user /home/user

USER user

WORKDIR $HOME/app

ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/local/lib/python3.10/dist-packages/tensorrt_libs
COPY --chown=user --from=caddy-builder  /usr/bin/caddy /usr/bin/caddy
COPY --chown=user . $HOME/app

ENV OMP_NUM_THREADS 4
ENV WORKER_COUNT 1
CMD ["supervisord"]

cls.translator = CTranslator(model_path, device='cuda', compute_type='auto', device_index=[0, 1], inter_threads=Config.worker_count)

winstxnhdw · 2023-10-13T04:03:13Z

There's a lot of redundancies and your final image size is probably larger than it ever needs to be. I think the one I suggested should work fine.

online2311 · 2023-10-13T08:06:20Z

That lacks something, lacks Lib and python model. I patched it day by day according to the prompts, and it became what it is now.

winstxnhdw · 2023-10-13T08:08:10Z

If you cloned the repository and built the Docker image in the repository, the correct libraries should be downloaded. Python will also be copied over from /usr/local/

online2311 · 2023-10-13T08:25:05Z

What you mean is "pip3 install torch torchvision torchaudio tensorflow ctranslate2 tensorrt" Put these into python-builder images, and then copy them to the nvidia/cuda:11.7.1-runtime-ubuntu22.04 image

winstxnhdw · 2023-10-13T08:56:59Z

No. My Python builder stage should already install the required dependencies. These dependencies are found in /usr/local/lib/python3.11/* which is copied over in the final build step. Therefore, you don't have to do anything.

winstxnhdw · 2023-11-26T14:44:43Z

I have officially tested and introduced CUDA support in this commit e7214f3.

The image size will be a lot smaller than your approach. Cheers!

online2311 · 2023-11-30T02:17:33Z

Awesome ~

online2311 closed this as completed Oct 22, 2023

winstxnhdw pinned this issue Dec 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nllb-api support GPU #49

nllb-api support GPU #49

online2311 commented Oct 10, 2023

winstxnhdw commented Oct 10, 2023 •

edited

Loading

online2311 commented Oct 12, 2023

online2311 commented Oct 12, 2023

online2311 commented Oct 12, 2023

winstxnhdw commented Oct 12, 2023 •

edited

Loading

online2311 commented Oct 13, 2023

winstxnhdw commented Oct 13, 2023 •

edited

Loading

online2311 commented Oct 13, 2023

winstxnhdw commented Oct 13, 2023

online2311 commented Oct 13, 2023

winstxnhdw commented Oct 13, 2023

winstxnhdw commented Nov 26, 2023

online2311 commented Nov 30, 2023

nllb-api support GPU #49

nllb-api support GPU #49

Comments

online2311 commented Oct 10, 2023

winstxnhdw commented Oct 10, 2023 • edited Loading

online2311 commented Oct 12, 2023

online2311 commented Oct 12, 2023

online2311 commented Oct 12, 2023

winstxnhdw commented Oct 12, 2023 • edited Loading

online2311 commented Oct 13, 2023

winstxnhdw commented Oct 13, 2023 • edited Loading

online2311 commented Oct 13, 2023

winstxnhdw commented Oct 13, 2023

online2311 commented Oct 13, 2023

winstxnhdw commented Oct 13, 2023

winstxnhdw commented Nov 26, 2023

online2311 commented Nov 30, 2023

winstxnhdw commented Oct 10, 2023 •

edited

Loading

winstxnhdw commented Oct 12, 2023 •

edited

Loading

winstxnhdw commented Oct 13, 2023 •

edited

Loading