This project was a part of Google Summer of Code 2021 under the organization CERN-HSF
Link to Project Page
Student's Name | Sanjiban Sengupta |
Mentors | Lorenzo Moneta, Sitong An, Anirudh Dagar |
Organization | Root-Project (CERN-HSF) |
Organization Code Repository | https://github.com/root-project/root |
Final Report | https://github.com/sanjibansg/GSoC21-RootStorage/wiki |
Code Implementations | https://github.com/root-project/root/pulls?q=author:sanjibansg |
Project Proposal | https://docs.google.com/document/d/1MVKpGP9lr0tUhrxB59nrNlZfAtnO_Dgkx8ddw1k26Yk/edit?usp=sharing |
Documentation Blog | https://blog.sanjiban.ml/series/gsoc |
The Toolkit for Multivariate Data Analysis (TMVA) is a sub-module of ROOT which provides a machine learning environment for conducting the training, testing, and evaluation of various multivariate methods especially used in High-energy Physics. Recently, the TMVA team introduced SOFIE (System for Fast Inference code Emit) which facilitates its own intermediate representation of deep learning models following the ONNX standards. To facilitate the usage, storage, and exchange of these models, this project aimed at developing the storage functionality of Deep Learning models in the `.root` format, popular in the High Energy Physics community.
- Functionality for serialization of RModel for storing a trained deep learning model in `.root` format.
- Functionality for parsing a Keras `.h5` file into a RModel object for generation of inference code.
- Functionality for parsing a PyTorch `.pt` file into a RModel object for generation of inference code.
- Tests,Tutorials & Documentations for various parsers of TMVA SOFIE's RModel object.
- Funcationality for Intermediate Representation of BDT Models and Parsing of TMVA trained BDT models
- Languages: C/C++, Python
- Deep Learning Libraries: Keras, PyTorch
- API: C-Python API
- Build: CMake
- Tests: GTest Framework
- Documentation: DOxygen
Installation Steps for building ROOT from source can be found here
https://root.cern/install/build_from_source/
Provided install.sh
can also be used which directly builds the repository and merges the implemented code files
git clone https://github.com/sanjibansg/GSoC21-RootStorage.git
cd GSoC21-RootStorage
./install.sh
-
Serialization of RModel
//Writing ROOT File TFile file("model.root","CREATE"); using namespace TMVA::Experimental; SOFIE::RModel model = SOFIE::PyKeras::Parse("trained_model_dense.h5"); model.Write("model"); file.Close(); //Reading ROOT File TFile file("model.root","READ"); using namespace TMVA::Experimental; SOFIE::RModel *model; file.GetObject("model",model); file.Close();
-
Keras Converter for RModel
//Parser returns a RModel object using TMVA::Experimental::SOFIE; RModel model = PyKeras::Parse("trained_model_dense.h5"); //Converter writes a ROOT file directly PyKeras::ConvertToRoot(“trained_model_dense.h5”);
-
PyTorch Converter for RModel
//Parser returns a RModel object using TMVA::Experimental::SOFIE; //Building the vector for input shapes std::vector<size_t> s1{120,1}; std::vector<std::vector<size_t>> inputShape{s1}; RModel model = PyTorch::Parse("trained_model_dense.pt",inputShape); //Converter write3s a ROOT file directly std::vector<size_t> s1{120,1}; std::vector<std::vector<size_t>> shape{s1}; PyTorch::ConvertToRoot(“trained_model_dense.pt”,inputShape);
-
Root Storage of BDT
//Parser loads the BDT model from .xml to RootStorage::BDT object TMVA::Experimental::RootStorage::BDT model; bool usePurity = true; model.Parse("TMVA_CNN_Classification_BDT.weights.xml",usePurity);
-
Development of Root Storage of BDT
- Develop the mapping interface for inference code generation from class RootStorage::BDT
- Researching on the conversion of scikit-learn based BDT models to class RootStorage::BDT for subsequent inference
- Adding tests & tutorials for BDT
- Adding Support for conversion of Convolution Layers from Keras and PyTorch models.
For existing bugs and adding more features open a issue here.