This is the PyTorch implementation by @Re-bin for RLMRec model proposed in this paper:
Representation Learning with Large Language Models for Recommendation
Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, Chao Huang*
WWW2024
* denotes corresponding author
In this paper, we propose a model-agnostic framework RLMRec that enhances existing recommenders with LLM-empowered representation learning. It proposes a paradigm that integrates representation learning with LLMs to capture intricate semantic aspects of user behaviors and preferences. RLMRec incorporates auxiliary textual signals, develops a user/item profiling paradigm empowered by LLMs, and aligns the semantic space of LLMs with the representation space of collaborative relational signals through a cross-view alignment framework.
You can run the following command to download the codes faster:
git clone --depth 1 https://github.com/HKUDS/RLMRec.git
Then run the following commands to create a conda environment:
conda create -y -n rlmrec python=3.9
conda activate rlmrec
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
pip install pyyaml tqdm
😉 The codes are developed based on the SSLRec framework.
We utilized three public datasets to evaluate RLMRec: Amazon-book, Yelp, and Steam.
Each user and item has a generated text description.
First of all, please download the data by running following commands.
cd data/
wget https://archive.org/download/rlmrec_data/data.zip
unzip data.zip
You can also download our data from the [Google Drive].
Each dataset consists of a training set, a validation set, and a test set. During the training process, we utilize the validation set to determine when to stop the training in order to prevent overfitting.
- amazon(yelp/steam)
|--- trn_mat.pkl # training set (sparse matrix)
|--- val_mat.pkl # validation set (sparse matrix)
|--- tst_mat.pkl # test set (sparse matrix)
|--- usr_prf.pkl # text description of users
|--- itm_prf.pkl # text description of items
|--- usr_emb_np.pkl # user text embeddings
|--- itm_emb_np.pkl # item text embeddings
- Each profile is a high quality text description of a user/item.
- Both user and item profiles are generated from Large Language Models from raw text data.
- The
user profile
(inusr_prf.pkl
) shows the particular types of items that the user tends to prefer. - The
item profile
(initm_prf.pkl
) articulates the specific types of users that the item is apt to attract.
😊 You can run the code python data/read_profile.py
as an example to read the profiles as follows.
$ python data/read_profile.py
User 123's Profile:
PROFILE: Based on the kinds of books the user has purchased and reviewed, they are likely to enjoy historical
fiction with strong character development, exploration of family dynamics, and thought-provoking themes. The user
also seems to enjoy slower-paced plots that delve deep into various perspectives. Books with unexpected twists,
connections between unrelated characters, and beautifully descriptive language could also be a good fit for
this reader.
REASONING: The user has purchased several historical fiction novels such as 'Prayers for Sale' and 'Fall of
Giants' which indicate an interest in exploring the past. Furthermore, the books they have reviewed, like 'Help
for the Haunted' and 'The Leftovers,' involve complex family relationships. Additionally, the user appreciates
thought-provoking themes and character-driven narratives as shown in their review of 'The Signature of All
Things' and 'The Leftovers.' The user also enjoys descriptive language, as demonstrated in their review of
'Prayers for Sale.'
- Each user and item has a semantic embedding encoded from its own profile using Text Embedding Models.
- The encoded semantic embeddings are stored in
usr_emb_np.pkl
anditm_emb_np.pkl
.
The original data of our dataset can be found from following links (thanks to their work):
- Yelp: https://www.yelp.com/dataset
- Amazon-book: https://cseweb.ucsd.edu/~jmcauley/datasets/amazon/links.html
- Steam: https://github.com/kang205/SASRec
We provide the mapping dictionary in JSON format in the data/mapper
folder to map the user/item ID
in our processed data to the original identification
in original data (e.g., asin for items in Amazon-book).
🤗 Welcome to use our processed data to improve your research!
The command to evaluate the backbone models and RLMRec is as follows.
-
Backbone
python encoder/train_encoder.py --model {model_name} --dataset {dataset} --cuda 0
-
RLMRec-Con (Constrastive Alignment):
python encoder/train_encoder.py --model {model_name}_plus --dataset {dataset} --cuda 0
-
RLMRec-Gen (Generative Alignment):
python encoder/train_encoder.py --model {model_name}_gene --dataset {dataset} --cuda 0
Supported models/datasets:
- model_name:
gccf
,lightgcn
,sgl
,simgcl
,dccf
,autocf
- dataset:
amazon
,yelp
,steam
Hypeparameters:
- The hyperparameters of each model are stored in
encoder/config/modelconf
(obtained by grid-search).
For advanced usage of arguments, run the code with --help argument.
Here we provide some examples with Yelp Data to generate user/item profiles and semantic representations.
Firstly, we need to complete the following three steps.
- Install the openai library
pip install openai
- Prepare your OpenAI API Key
- Enter your key on
Line 5
of these files:generation\{item/user/emb}\generate_{profile/emb}.py
.
Then, here are the commands to generate the desired output with examples:
-
Item Profile Generation:
python generation/item/generate_profile.py
-
User Profile Generation:
python generation/user/generate_profile.py
-
Semantic Representation:
python generation/emb/generate_emb.py
For semantic representation encoding, you can also try other text embedding models like Instructor or Contriever.
😀 The instructions we designed are saved in the {user/item}_system_prompt.txt
files and also the generation/instruction
folder. You can modify them according to your requirements and generate the desired output!
If you find this work is helpful to your research, please consider citing our paper:
@inproceedings{ren2024representation,
title={Representation learning with large language models for recommendation},
author={Ren, Xubin and Wei, Wei and Xia, Lianghao and Su, Lixin and Cheng, Suqi and Wang, Junfeng and Yin, Dawei and Huang, Chao},
booktitle={Proceedings of the ACM on Web Conference 2024},
pages={3464--3475},
year={2024}
}
Thanks for your interest in our work!