Intel solution for RecSys challenge 2023

PROJECT NOT UNDER ACTIVE MANAGEMENT

This project will no longer be maintained by Intel.

Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.

Intel no longer accepts patches to this project.

If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.

Contact: [email protected]

Intel solution for RecSys challenge 2023

This repository provides the official implementation of Intel solution for RecSys 2023 challenge from the paper: Graph Enhanced Feature Engineering for Privacy Preserving Recommendation Systems.

The solution for RecSys Challenge 2023 leverages a novel feature classification method to categorize anonymous features into different groups and apply enhanced feature engineering and graph neural networks to reveal underlying information and improve prediction accuracy. This solution can also be generalized to other privacy preserving recommendation systems. Our team name is LearningFE, final submission got score as 5.892977 and ranks at 2’nd on the leaderboard.

Introduction

RecSys2023 Challenge

RecSys 2023 challenge focus on online advertising, improving deep funnel optimization, and user privacy. The dataset corresponds to impressions of users and ads from ShareChat + Moj app, where each impression (a set of 80 features) is an advertisement (ad) that was shown to a user and whether that resulted in a click or an install of the application corresponding to the advertisement. The problem is to predict the probability of an impression resulting in application installation.

Intel's solution

The core idea of our solution is to augment and enrich the original dataset using (i) privacy preserving feature engineering, (ii) bipartite graph neural networks, and (iii) similarity-based graph neural network. The augmented datasets from the above approaches are used with gradient-boosted decision trees (XGBoost and LGBM) to predict the probability of installation. Finally, we ensemble the three solutions together to obtain our final result.

Architechture

Our solution is the ensemble of 3 models by using different training methods and feature sets as showed in below graph. We generate new features from several mehtods: (i) privacy preserving feature engineering output, (ii) supervised bipartite GNN (BiGNN) embeddings, (iii) self-supervised BiGNN embeddings and (iv) similarity Graph GNN (simGNN) embeddings. The first model is a LightGBM model trained with enhanced feature sets. The second model is an XGBoost model trained with enhanced feature sets and supervised BiGNN embeddings. The third model is a LightGBM model trained with enhanced feature sets, simGNN embeddings and self-supervised BiGNN embeddings.

Model architechture overview.

Key Components

Feature engineer pipeline

We propose a novel feature engineering pipeline for privacy-preservation dataset, which is capable of enriching the features' expressiveness based on feature distribution characteristics. This method comprises of three major steps: a) analysis and classification, b) feature engineering, and c) feature selection.

Feature engineer Pipeline Overview.

GNN embedding feature

We employ GNN to generate the embedding feature as input of the GBDT model for installation prediction. Depends on the graph representation, we produce three type of GNN embedding feature: a) self-supervised GNN, b) supervised GNN, c) Similarity GNN.

How to run

step1: prepare your data
- put sharechat raw data under data/sharechat_recsys2023_data
step2: Follow 0_train to complete model training and saving
- use 1_LearningFE to create processed data + encoder and LGBM model, see here for details
- use 2_supGNN to create GNN model and xgboost model, see here for details
- use 3_BiGNN to create GNN model, see here for details
- use 4_simGNN to create GNN model and LGBM model, see here for details
step3: Follow 1_inference to do test data inference
- use 1_LearningFE to inference test data, see here for details
- use 2_supGNN to inference test data, see here for details
- use 3_BiGNN to create inference embeddings, see here for details
- use 4_simGNN+BiGNN to inference test data, see here for details
- use 5_ensemble.ipynb to get final result, see here for details

Citation

If you use this codebase, or otherwise found our work valuable, please cite:

@inproceedings{intelsolutionforrecsys2023,
  title={Graph Enhanced Feature Engineering for Privacy Preserving Recommendation System},
  author={Chendi Xue, Xinyao Wang, Yu Zhou, Poovaiah Palangappa, Ravi Motwani, Rita Brugarolas Brufau, Aasavari Dhananjay Kakne, Ke Ding, Jian Zhang},
  booktitle={RecSys 2023},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
0_train		0_train
1_inference		1_inference
docs/graphs		docs/graphs
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
security.md		security.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intel solution for RecSys challenge 2023

Introduction

RecSys2023 Challenge

Intel's solution

Architechture

Key Components

Feature engineer pipeline

GNN embedding feature

How to run

Citation

About

Releases

Packages

Contributors 4

Languages

License

intel/recsys2023-intel-submission

Folders and files

Latest commit

History

Repository files navigation

Intel solution for RecSys challenge 2023

Introduction

RecSys2023 Challenge

Intel's solution

Architechture

Key Components

Feature engineer pipeline

GNN embedding feature

How to run

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages