How to save train, test data and model preditctions? #762

johnny12150 · 2021-03-09T02:55:58Z

johnny12150
Mar 9, 2021

Is there a quick way to save the train, test data after splitting them?
How to preserve the model predictions and ground truth at the testing phase?

Answered by chenyushuo

Mar 9, 2021

For the first question, we suggest that you can use pickle to dump these split data, just like this:

import pickle
with open('split_data.pth', 'wb') as f:
    pickle.dump((train_data, test_data), f)

And we added save funtion for split data in #760 recently. You can use save_split_dataloaders and load_split_dataloaders to save and load split data in the later version.

For the second question, you can see these code (mainly based on #506):

import numpy as np
import torch

from recbole.data.dataloader.general_dataloader import GeneralFullDataLoader
from recbole.data.dataloader.sequential_dataloader import SequentialFullDataLoader


uid_series = np.array([1, 2])

# We assume you have load tes…

View full answer

chenyushuo · 2021-03-09T08:09:46Z

chenyushuo
Mar 9, 2021
Maintainer

For the first question, we suggest that you can use pickle to dump these split data, just like this:

import pickle
with open('split_data.pth', 'wb') as f:
    pickle.dump((train_data, test_data), f)

And we added save funtion for split data in #760 recently. You can use save_split_dataloaders and load_split_dataloaders to save and load split data in the later version.

For the second question, you can see these code (mainly based on #506):

import numpy as np
import torch

from recbole.data.dataloader.general_dataloader import GeneralFullDataLoader
from recbole.data.dataloader.sequential_dataloader import SequentialFullDataLoader


uid_series = np.array([1, 2])

# We assume you have load test_data and model
uid_field = test_data.dataset.uid_field
dataset = test_data.dataset
model.eval()

if isinstance(test_data, GeneralFullDataLoader):
    index = np.isin(test_data.user_df[uid_field].numpy(), uid_series)
    input_interaction = test_data.user_df[index]
elif isinstance(test_data, SequentialFullDataLoader):
    index = np.isin(test_data.uid_list, uid_series)
    input_interaction = test_data.augmentation(
        test_data.item_list_index[index], test_data.target_index[index], test_data.item_list_length[index]
    )
else:
    raise NotImplementedError

# Get scores of all items
try:
    scores = model.full_sort_predict(input_interaction)
except NotImplementedError:
    input_interaction = input_interaction.repeat(dataset.item_num)
    input_interaction.update(test_data.get_item_feature().repeat(len(uid_series)))
    scores = model.predict(input_interaction)

scores = scores.view(-1, dataset.item_num)  # scores of all item, shape: len(uid_series) * 

# Get Groud truth interaction
index = np.isin(test_data.dataset.inter_feat[uid_field].numpy(), uid_series)
real_inter = test_data.dataset.inter_feat[index]  # the ground truth interaction

6 replies

chenyushuo Mar 9, 2021
Maintainer

Yes, run_recbole() is just a simple function to run models. In fact, you can modify this function as your need.

johnny12150 Mar 14, 2021
Author

@chenyushuo
I am facing an error that test_data.augmentation in the script is missing one argument.
Should I just pass uid_series in the first position?

johnny12150 Mar 14, 2021
Author

Also, is it possible no to do data augmentation in the testing phase to generate input_interaction?

johnny12150 Mar 14, 2021
Author

Besides, the interaction should be a dictionary of tensors rather than NumPy arrays?

Hence, I am getting the error of this.

Traceback (most recent call last):
  File "D:/codes/paper codes/Recbole/run_custom.py", line 39, in get_pred
    scores = model.full_sort_predict(input_interaction)
  File "C:\Users\wade\anaconda3\envs\pytorch\lib\site-packages\recbole\model\sequential_recommender\gru4rec.py", line 117, in full_sort_predict
    seq_output = self.forward(item_seq, item_seq_len)
  File "C:\Users\wade\anaconda3\envs\pytorch\lib\site-packages\recbole\model\sequential_recommender\gru4rec.py", line 78, in forward
    item_seq_emb = self.item_embedding(item_seq)
  File "C:\Users\wade\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\wade\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 124, in forward
    return F.embedding(
  File "C:\Users\wade\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not numpy.ndarray

chenyushuo Mar 16, 2021
Maintainer

The argument of test_data.augmentation has been updated in #559. We suggest you to use the lastest version of RecBole.
Beside, in RecBole, data augmentation is necessary in the testing phase to generate input_interaction for sequential model.
As for the interaction, it should be a dictionary of torch.tensors, you can see this page for more infomation about Interaction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to save train, test data and model preditctions? #762

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to save train, test data and model preditctions? #762

johnny12150 Mar 9, 2021

Replies: 1 comment · 6 replies

chenyushuo Mar 9, 2021 Maintainer

chenyushuo Mar 9, 2021 Maintainer

johnny12150 Mar 14, 2021 Author

johnny12150 Mar 14, 2021 Author

johnny12150 Mar 14, 2021 Author

chenyushuo Mar 16, 2021 Maintainer

johnny12150
Mar 9, 2021

Replies: 1 comment 6 replies

chenyushuo
Mar 9, 2021
Maintainer

chenyushuo Mar 9, 2021
Maintainer

johnny12150 Mar 14, 2021
Author

johnny12150 Mar 14, 2021
Author

johnny12150 Mar 14, 2021
Author

chenyushuo Mar 16, 2021
Maintainer