-
Notifications
You must be signed in to change notification settings - Fork 221
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds the container package for SSD-ResNet34 inference and training (P…
…yTorch SPR) (#96) * Add specs, docs, and dockerfiles for PyTorch SPR SSD-ResNet34 inference and training * Update spec for train scripts * Fix PRETRAINED_MODEL volume in run.sh * Update training partial * Copy in requirements * remove numpy since we already have it, and add shm-size * Addin missing new line * Updates based on review feedback
- Loading branch information
Showing
32 changed files
with
1,394 additions
and
0 deletions.
There are no files selected for viewing
79 changes: 79 additions & 0 deletions
79
dockerfiles/pytorch/pytorch-spr-ssd-resnet34-inference.Dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Copyright (c) 2020-2021 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================ | ||
# | ||
# THIS IS A GENERATED DOCKERFILE. | ||
# | ||
# This file was assembled from multiple pieces, whose use is documented | ||
# throughout. Please refer to the TensorFlow dockerfiles documentation | ||
# for more information. | ||
|
||
ARG PYTORCH_IMAGE="model-zoo" | ||
ARG PYTORCH_TAG="pytorch-ipex-spr" | ||
|
||
FROM ${PYTORCH_IMAGE}:${PYTORCH_TAG} AS intel-optimized-pytorch | ||
|
||
# Build Torch Vision | ||
ARG TORCHVISION_VERSION=v0.8.0 | ||
|
||
RUN source ~/anaconda3/bin/activate pytorch && \ | ||
git clone https://github.com/pytorch/vision && \ | ||
cd vision && \ | ||
git checkout ${TORCHVISION_VERSION} && \ | ||
python setup.py install | ||
|
||
RUN source ~/anaconda3/bin/activate pytorch && \ | ||
pip install matplotlib Pillow pycocotools && \ | ||
pip install yacs opencv-python cityscapesscripts transformers && \ | ||
conda install -y libopenblas psutil && \ | ||
cd /workspace/installs && \ | ||
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.7.90/gperftools-2.7.90.tar.gz && \ | ||
tar -xzf gperftools-2.7.90.tar.gz && \ | ||
cd gperftools-2.7.90 && \ | ||
./configure --prefix=$HOME/.local && \ | ||
make && \ | ||
make install && \ | ||
rm -rf /workspace/installs/ | ||
|
||
ARG PACKAGE_DIR=model_packages | ||
|
||
ARG PACKAGE_NAME="pytorch-spr-ssd-resnet34-inference" | ||
|
||
ARG MODEL_WORKSPACE | ||
|
||
# ${MODEL_WORKSPACE} and below needs to be owned by root:root rather than the current UID:GID | ||
# this allows the default user (root) to work in k8s single-node, multi-node | ||
RUN umask 002 && mkdir -p ${MODEL_WORKSPACE} && chgrp root ${MODEL_WORKSPACE} && chmod g+s+w,o+s+r ${MODEL_WORKSPACE} | ||
|
||
ADD --chown=0:0 ${PACKAGE_DIR}/${PACKAGE_NAME}.tar.gz ${MODEL_WORKSPACE} | ||
|
||
RUN chown -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chgrp -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chmod -R g+s+w ${MODEL_WORKSPACE}/${PACKAGE_NAME} && find ${MODEL_WORKSPACE}/${PACKAGE_NAME} -type d | xargs chmod o+r+x | ||
|
||
WORKDIR ${MODEL_WORKSPACE}/${PACKAGE_NAME} | ||
|
||
FROM intel-optimized-pytorch AS release | ||
COPY --from=intel-optimized-pytorch /root/anaconda3 /root/anaconda3 | ||
COPY --from=intel-optimized-pytorch /workspace/lib/ /workspace/lib/ | ||
COPY --from=intel-optimized-pytorch /root/.local/ /root/.local/ | ||
|
||
ENV DNNL_MAX_CPU_ISA="AVX512_CORE_AMX" | ||
|
||
ENV PATH="~/anaconda3/bin:${PATH}" | ||
ENV LD_PRELOAD="/workspace/lib/jemalloc/lib/libjemalloc.so:$LD_PRELOAD" | ||
ENV MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000" | ||
ENV BASH_ENV=/root/.bash_profile | ||
WORKDIR /workspace/ | ||
RUN yum install -y numactl mesa-libGL && \ | ||
yum clean all && \ | ||
echo "source activate pytorch" >> /root/.bash_profile |
88 changes: 88 additions & 0 deletions
88
dockerfiles/pytorch/pytorch-spr-ssd-resnet34-training.Dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# Copyright (c) 2020-2021 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================ | ||
# | ||
# THIS IS A GENERATED DOCKERFILE. | ||
# | ||
# This file was assembled from multiple pieces, whose use is documented | ||
# throughout. Please refer to the TensorFlow dockerfiles documentation | ||
# for more information. | ||
|
||
ARG PYTORCH_IMAGE="model-zoo" | ||
ARG PYTORCH_TAG="pytorch-ipex-spr" | ||
|
||
FROM ${PYTORCH_IMAGE}:${PYTORCH_TAG} AS intel-optimized-pytorch | ||
|
||
RUN source ~/anaconda3/bin/activate pytorch && \ | ||
pip install matplotlib Pillow pycocotools && \ | ||
pip install yacs opencv-python cityscapesscripts transformers && \ | ||
conda install -y libopenblas psutil && \ | ||
cd /workspace/installs && \ | ||
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.7.90/gperftools-2.7.90.tar.gz && \ | ||
tar -xzf gperftools-2.7.90.tar.gz && \ | ||
cd gperftools-2.7.90 && \ | ||
./configure --prefix=$HOME/.local && \ | ||
make && \ | ||
make install && \ | ||
rm -rf /workspace/installs/ | ||
|
||
ARG PACKAGE_DIR=model_packages | ||
|
||
ARG PACKAGE_NAME="pytorch-spr-ssd-resnet34-training" | ||
|
||
ARG MODEL_WORKSPACE | ||
|
||
# ${MODEL_WORKSPACE} and below needs to be owned by root:root rather than the current UID:GID | ||
# this allows the default user (root) to work in k8s single-node, multi-node | ||
RUN umask 002 && mkdir -p ${MODEL_WORKSPACE} && chgrp root ${MODEL_WORKSPACE} && chmod g+s+w,o+s+r ${MODEL_WORKSPACE} | ||
|
||
ADD --chown=0:0 ${PACKAGE_DIR}/${PACKAGE_NAME}.tar.gz ${MODEL_WORKSPACE} | ||
|
||
RUN chown -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chgrp -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chmod -R g+s+w ${MODEL_WORKSPACE}/${PACKAGE_NAME} && find ${MODEL_WORKSPACE}/${PACKAGE_NAME} -type d | xargs chmod o+r+x | ||
|
||
WORKDIR ${MODEL_WORKSPACE}/${PACKAGE_NAME} | ||
|
||
RUN source ~/anaconda3/bin/activate pytorch && \ | ||
pip install --upgrade pip && \ | ||
pip install --no-cache-dir https://github.com/mlperf/logging/archive/9ea0afa.zip && \ | ||
pip install --no-cache-dir \ | ||
Cython==0.28.4 \ | ||
git+http://github.com/NVIDIA/apex.git@9041a868a1a253172d94b113a963375b9badd030#egg=apex \ | ||
mlperf-compliance==0.0.10 \ | ||
cycler==0.10.0 \ | ||
kiwisolver==1.0.1 \ | ||
matplotlib==2.2.2 \ | ||
Pillow==5.2.0 \ | ||
pyparsing==2.2.0 \ | ||
python-dateutil==2.7.3 \ | ||
pytz==2018.5 \ | ||
six==1.11.0 \ | ||
torchvision==0.2.1 \ | ||
pycocotools==2.0.2 | ||
|
||
FROM intel-optimized-pytorch AS release | ||
COPY --from=intel-optimized-pytorch /root/anaconda3 /root/anaconda3 | ||
COPY --from=intel-optimized-pytorch /workspace/lib/ /workspace/lib/ | ||
COPY --from=intel-optimized-pytorch /root/.local/ /root/.local/ | ||
|
||
ENV DNNL_MAX_CPU_ISA="AVX512_CORE_AMX" | ||
|
||
ENV PATH="~/anaconda3/bin:${PATH}" | ||
ENV LD_PRELOAD="/workspace/lib/jemalloc/lib/libjemalloc.so:$LD_PRELOAD" | ||
ENV MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000" | ||
ENV BASH_ENV=/root/.bash_profile | ||
WORKDIR /workspace/ | ||
RUN yum install -y numactl mesa-libGL && \ | ||
yum clean all && \ | ||
echo "source activate pytorch" >> /root/.bash_profile |
26 changes: 26 additions & 0 deletions
26
...rt/object_detection/pytorch/ssd-resnet34/inference/cpu/.docs/container_build.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
## Build the container | ||
|
||
The <model name> <mode> package has scripts and a Dockerfile that are | ||
used to build a workload container that runs the model. This container | ||
uses the PyTorch/IPEX container as it's base, so ensure that you have built | ||
the `pytorch-ipex-spr.tar.gz` container prior to building this model container. | ||
|
||
Use `docker images` to verify that you have the base container built. For example: | ||
``` | ||
$ docker images | grep pytorch-ipex-spr | ||
model-zoo pytorch-ipex-spr fecc7096a11e 40 minutes ago 8.31GB | ||
``` | ||
|
||
To build the <model name> <mode> container, extract the package and | ||
run the `build.sh` script. | ||
``` | ||
# Extract the package | ||
tar -xzf <package name> | ||
cd <package dir> | ||
# Build the container | ||
./build.sh | ||
``` | ||
|
||
After the build completes, you should have a container called | ||
`<docker image>` that will be used to run the model. |
33 changes: 33 additions & 0 deletions
33
quickstart/object_detection/pytorch/ssd-resnet34/inference/cpu/.docs/datasets.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
## Datasets | ||
|
||
### COCO | ||
|
||
The [COCO dataset](https://cocodataset.org) is used to run <model name>. | ||
|
||
Download and extract the 2017 training/validation images and annotations from the | ||
[COCO dataset website](https://cocodataset.org/#download) to a `coco` folder | ||
and unzip the files. After extracting the zip files, your dataset directory | ||
structure should look something like this: | ||
``` | ||
coco | ||
├── annotations | ||
│ ├── captions_train2017.json | ||
│ ├── captions_val2017.json | ||
│ ├── instances_train2017.json | ||
│ ├── instances_val2017.json | ||
│ ├── person_keypoints_train2017.json | ||
│ └── person_keypoints_val2017.json | ||
├── train2017 | ||
│ ├── 000000454854.jpg | ||
│ ├── 000000137045.jpg | ||
│ ├── 000000129582.jpg | ||
│ └── ... | ||
└── val2017 | ||
├── 000000000139.jpg | ||
├── 000000000285.jpg | ||
├── 000000000632.jpg | ||
└── ... | ||
``` | ||
The parent of the `annotations`, `train2017`, and `val2017` directory (in this example `coco`) | ||
is the directory that should be used when setting the `DATASET_DIR` environment | ||
variable for <model name> (for example: `export DATASET_DIR=/home/<user>/coco`). |
5 changes: 5 additions & 0 deletions
5
...kstart/object_detection/pytorch/ssd-resnet34/inference/cpu/.docs/description.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
<!-- 10. Description --> | ||
## Description | ||
|
||
This document has instructions for running <model name> <mode> using | ||
Intel-optimized PyTorch. |
33 changes: 33 additions & 0 deletions
33
quickstart/object_detection/pytorch/ssd-resnet34/inference/cpu/.docs/docker_spr.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
## Run the model | ||
|
||
Download the pretrained model weights using the script from the MLPerf repo | ||
and set the `PRETRAINED_MODEL` environment variable to point to the downloaded file: | ||
``` | ||
wget https://raw.githubusercontent.com/mlcommons/inference/v0.7/others/cloud/single_stage_detector/download_model.sh | ||
sh download_model.sh | ||
export PRETRAINED_MODEL=$(pwd)/pretrained/resnet34-ssd1200.pth | ||
``` | ||
|
||
After downloading the pretrained model and following the instructions to | ||
[build the container](#build-the-container) and [prepare the dataset](#datasets), | ||
use the `run.sh` script from the container package to run <model name> <mode> | ||
using docker. Set environment variables to specify the dataset directory, | ||
precision to run, and an output directory for logs. By default, the `run.sh` | ||
script will run the `inference_realtime.sh` quickstart script. To run a different | ||
script, specify the name of the script using the `SCRIPT` environment variable. | ||
``` | ||
# Navigate to the container package directory | ||
cd <package dir> | ||
# Set the required environment vars | ||
export DATASET_DIR=<path to the coco dataset> | ||
export PRETRAINED_MODEL=<path to the resnet34-ssd1200.pth file> | ||
export PRECISION=<specify the precision to run> | ||
export OUTPUT_DIR=<directory where log files will be written> | ||
# Run the container with inference_realtime.sh quickstart script | ||
./run.sh | ||
# To run a difference quickstart script, us the SCRIPT env var | ||
SCRIPT=accuracy.sh ./run.sh | ||
``` |
4 changes: 4 additions & 0 deletions
4
quickstart/object_detection/pytorch/ssd-resnet34/inference/cpu/.docs/license.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
<!--- 80. License --> | ||
## License | ||
|
||
Licenses can be found in the model package, in the `licenses` directory. |
8 changes: 8 additions & 0 deletions
8
quickstart/object_detection/pytorch/ssd-resnet34/inference/cpu/.docs/quickstart.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
<!--- 40. Quick Start Scripts --> | ||
## Quick Start Scripts | ||
|
||
| Script name | Description | | ||
|-------------|-------------| | ||
| `inference_realtime.sh` | Runs multi instance realtime inference using 4 cores per instance for the specified precision (fp32, int8 or bf16). | | ||
| `inference_throughput.sh` | Runs multi instance batch inference using 1 instance per socket for the specified precision (fp32, int8 or bf16). | | ||
| `accuracy.sh` | Measures the inference accuracy (providing a `DATASET_DIR` environment variable is required) for the specified precision (fp32, int8 or bf16). | |
2 changes: 2 additions & 0 deletions
2
quickstart/object_detection/pytorch/ssd-resnet34/inference/cpu/.docs/title.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
<!--- 0. Title --> | ||
# PyTorch <model name> <mode> |
16 changes: 16 additions & 0 deletions
16
...rt/object_detection/pytorch/ssd-resnet34/inference/cpu/.docs/wrapper_package.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
## Model Package | ||
|
||
The model package includes the Dockerfile and scripts needed to build and | ||
run <model name> <mode> in a container. | ||
``` | ||
<package dir> | ||
├── README.md | ||
├── build.sh | ||
├── licenses | ||
│ ├── LICENSE | ||
│ └── third_party | ||
├── model_packages | ||
│ └── <package name> | ||
├── <package dir>.Dockerfile | ||
└── run.sh | ||
``` |
Oops, something went wrong.