Skip to content

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

License

Notifications You must be signed in to change notification settings

bashimao/HugeCTR

 
 

Repository files navigation

logo Merlin: HugeCTR

v30

HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-Through Rates (CTRs). HugeCTR supports model-parallel embedding tables and data-parallel neural networks and their variants such as Wide and Deep Learning (WDL), Deep Cross Network (DCN), DeepFM, and Deep Learning Recommendation Model (DLRM). HugeCTR is a component of NVIDIA Merlin Open Beta, which is used to build large-scale deep learning recommender systems. For additional information, see HugeCTR User Guide.

Design Goals:

  • Fast: HugeCTR is a speed-of-light CTR model framework that can outperform popular recommender systems such as TensorFlow (TF).
  • Efficient: HugeCTR provides the essentials so that you can efficiently train your CTR model.
  • Easy: Regardless of whether you are a data scientist or machine learning practitioner, we've made it easy for anybody to use HugeCTR.

Table of Contents

Core Features

HugeCTR supports a variety of features, including the following:

To learn about our latest enhancements, see our release notes.

Getting Started

If you'd like to quickly train a model using the Python interface, follow these steps:

  1. Start a NGC container with your local host directory (/your/host/dir mounted) by running the following command:

    docker run --runtime=nvidia --rm -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) -it nvcr.io/nvidia/merlin/merlin-training:0.5
    

    NOTE: The /your/host/dir directory is just as visible as the /your/container/dir directory. The /your/host/dir directory is also your starting directory.

  2. Activate the merlin conda environment by running the following command:

    source activate merlin
    
  3. Inside the container, copy the DCN configuration file to our mounted directory (/your/container/dir).

    This config file specifies the DCN model architecture and its optimizer. With any Python use case, the solver clause within the configuration file is not used at all.

  4. Generate a synthetic dataset based on the configuration file by running the following command:

    ./data_generator --config-file dcn.json --voc-size-array 39884,39043,17289,7420,20263,3,7120,1543,39884,39043,17289,7420,20263,3,7120,1543,63,63,39884,39043,17289,7420,20263,3,7120,1543 --distribution powerlaw --alpha -1.2
    
  5. Write a simple Python code using the hugectr module as shown here:

    # train.py
    import sys
    import hugectr
    from mpi4py import MPI
    
    def train(json_config_file):
      solver_config = hugectr.solver_parser_helper(batchsize = 16384,
                                                   batchsize_eval = 16384,
                                                   vvgpu = [[0,1,2,3,4,5,6,7]],
                                                   repeat_dataset = True)
      sess = hugectr.Session(solver_config, json_config_file)
      sess.start_data_reading()
      for i in range(10000):
        sess.train()
        if (i % 100 == 0):
          loss = sess.get_current_loss()
          print("[HUGECTR][INFO] iter: {}; loss: {}".format(i, loss))
    
    if __name__ == "__main__":
      json_config_file = sys.argv[1]
      train(json_config_file)
    
    

    NOTE: Update the vvgpu (the active GPUs), batchsize, and batchsize_eval parameters according to your GPU system.

  6. Train the model by running the following command:

    python train.py dcn.json
    

For additional information, see the HugeCTR User Guide.

Support and Feedback

If you encounter any issues and/or have questions, please file an issue here so that we can provide you with the necessary resolutions and answers. To further advance the Merlin/HugeCTR Roadmap, we encourage you to share all the details regarding your recommender system pipeline using this survey.

About

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 44.0%
  • Jupyter Notebook 25.5%
  • Cuda 21.4%
  • Python 7.2%
  • CMake 1.2%
  • Shell 0.4%
  • Other 0.3%