Skip to content
This repository has been archived by the owner on Oct 7, 2024. It is now read-only.

intel/recsys2023-intel-submission

PROJECT NOT UNDER ACTIVE MANAGEMENT

This project will no longer be maintained by Intel.

Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.

Intel no longer accepts patches to this project.

If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.

Contact: [email protected]

Intel solution for RecSys challenge 2023

This repository provides the official implementation of Intel solution for RecSys 2023 challenge from the paper: Graph Enhanced Feature Engineering for Privacy Preserving Recommendation Systems.

The solution for RecSys Challenge 2023 leverages a novel feature classification method to categorize anonymous features into different groups and apply enhanced feature engineering and graph neural networks to reveal underlying information and improve prediction accuracy. This solution can also be generalized to other privacy preserving recommendation systems. Our team name is LearningFE, final submission got score as 5.892977 and ranks at 2’nd on the leaderboard.

Introduction

RecSys2023 Challenge

RecSys 2023 challenge focus on online advertising, improving deep funnel optimization, and user privacy. The dataset corresponds to impressions of users and ads from ShareChat + Moj app, where each impression (a set of 80 features) is an advertisement (ad) that was shown to a user and whether that resulted in a click or an install of the application corresponding to the advertisement. The problem is to predict the probability of an impression resulting in application installation.

Intel's solution

The core idea of our solution is to augment and enrich the original dataset using (i) privacy preserving feature engineering, (ii) bipartite graph neural networks, and (iii) similarity-based graph neural network. The augmented datasets from the above approaches are used with gradient-boosted decision trees (XGBoost and LGBM) to predict the probability of installation. Finally, we ensemble the three solutions together to obtain our final result.

Architechture

Our solution is the ensemble of 3 models by using different training methods and feature sets as showed in below graph. We generate new features from several mehtods: (i) privacy preserving feature engineering output, (ii) supervised bipartite GNN (BiGNN) embeddings, (iii) self-supervised BiGNN embeddings and (iv) similarity Graph GNN (simGNN) embeddings. The first model is a LightGBM model trained with enhanced feature sets. The second model is an XGBoost model trained with enhanced feature sets and supervised BiGNN embeddings. The third model is a LightGBM model trained with enhanced feature sets, simGNN embeddings and self-supervised BiGNN embeddings.


Model architechture overview.

Key Components

Feature engineer pipeline

We propose a novel feature engineering pipeline for privacy-preservation dataset, which is capable of enriching the features' expressiveness based on feature distribution characteristics. This method comprises of three major steps: a) analysis and classification, b) feature engineering, and c) feature selection.


Feature engineer Pipeline Overview.

GNN embedding feature

We employ GNN to generate the embedding feature as input of the GBDT model for installation prediction. Depends on the graph representation, we produce three type of GNN embedding feature: a) self-supervised GNN, b) supervised GNN, c) Similarity GNN.

How to run

  • step1: prepare your data

    • put sharechat raw data under data/sharechat_recsys2023_data
  • step2: Follow 0_train to complete model training and saving

    • use 1_LearningFE to create processed data + encoder and LGBM model, see here for details
    • use 2_supGNN to create GNN model and xgboost model, see here for details
    • use 3_BiGNN to create GNN model, see here for details
    • use 4_simGNN to create GNN model and LGBM model, see here for details
  • step3: Follow 1_inference to do test data inference

    • use 1_LearningFE to inference test data, see here for details
    • use 2_supGNN to inference test data, see here for details
    • use 3_BiGNN to create inference embeddings, see here for details
    • use 4_simGNN+BiGNN to inference test data, see here for details
    • use 5_ensemble.ipynb to get final result, see here for details

Citation

If you use this codebase, or otherwise found our work valuable, please cite:

@inproceedings{intelsolutionforrecsys2023,
  title={Graph Enhanced Feature Engineering for Privacy Preserving Recommendation System},
  author={Chendi Xue, Xinyao Wang, Yu Zhou, Poovaiah Palangappa, Ravi Motwani, Rita Brugarolas Brufau, Aasavari Dhananjay Kakne, Ke Ding, Jian Zhang},
  booktitle={RecSys 2023},
  year={2023}
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •