The work is dedicated to creating an algorithm aimed at reducing the number of artifacts among anomaly candidates during active learning. For training the classifier, a labeled dataset from the akb database is used. For unlabeled data, the value of the decision function of the trained classifier is computed and added to the original features as a new feature. Then, when applying active learning on unlabeled data, the feature set with the additional column can be used.
train_rb_model.py
-- The code necessary for model training. The function make_argument_parser()
provides a description of the arguments that can be passed when running the code. Initially, the training set is defined, and for each object, the light curve is downloaded. Then, a feature extractor is defined (among available from http://features.lc.snad.space/api/latest), and features for all objects in the training set are extracted (and saved) using it. Finally, a random forest is trained on the obtained features and labels. After training, quality metrics calculated through cross-validation are displayed. The trained model is saved in the ONNX format.
run_rb_model.py
-- The code using the trained model. Similar to the script described above, the function make_argument_parser()
provides a description of the arguments. First, the trained model and unlabeled data are loaded. Then, the data is input to the model, and the model's output is saved (either as an additional column to the original feature set or simply as a column of decision function values).