We prepare train data with variety of feature engineering to catogorical features and numerical features, which includes GlobalCountEncoding, NewValueOneHotEncoding and Indexing.
Feature engineer pipeline overview. Role1 and Role2 refer to users and ads in this dataset. The same holds for rest 'Role' terms in this solution.
After done feature engineeirng, processed train data will be used in two manners.
- Training with LGBM to do inference on test data
- Combining with GNN embeddings and training with LGBM/XGBoost to do inference on test data. (How to generate GNN embeddings is decribed in 0_train/2_supGNN, 0_train/3_BiGNN and 0_train/4_simGNN)
# 0. We have assumption that train data has been located under data/sharechat_recsys2023_data/train
step1:
open 0_data_prepare.ipynb and run through whole script
step2:
open 1_train_LGBM.ipynb and run through whole script
please switch to 1_inference/1_LearningFE