How to process session based recomennder data? #1219
-
When I see the default benchmark data 'diginetica-session', it has a structure as following:
It is making adding item one by one and making many rows by adding one item to a sequence. But atomic file diginetica.inter has features as follows:
If I set item_id like sequential model config as below,
item_id_list will be made but it wont be anything like diginetica-session at the top. It will be just like a sequence per user as below.
I guess this is how I should make sequence for session based model? I guess diginetica-session data form should be obtained somewhere? So my question is as below.
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Hi, For the first question, please pay attention to session_based_rec_example.py, which shows an example to load the session benchmarks. By the way, the example has a very similar procedure as those in GCE-GNN's release code. For the second question, actually, it depends:
|
Beta Was this translation helpful? Give feedback.
-
Hi, Sorry for misunderstanding the questions.
(2) Then, I would like to talk about my understanding of the differences between session-based recommendation and sequential recommendation. Note that this may be not correct, it's just from my current understanding. In my opinion, in session-based recommendation, all sessions are assumed to happen in a short-term period, such as a few minutes or hours. Thus when we evaluate session-based methods, original item sequences of the same session-id should not be divided into train/test/valid sets. In sequential recommendation, we assume that we can observe a user's long-term interaction history, then it's natural to have one user's partial history in training set, then we can predict this user's next item in valid/test sets. Hope that this is a bit clearer than before. :) |
Beta Was this translation helpful? Give feedback.
Hi, Sorry for misunderstanding the questions.
Yes, that's right.
If your YAML file contains the
benchmark_filename
arg, then nothing will happen. No data augmentation, no grouping, no splitting. Because we assume that you have done all of these to generate a benchmark dataset.Otherwise (no
benchmark_filename
in your YAML), then the input is as following,we will then perform data augmentation for sequential recommendation, the interactions are grouped and sorted into