Yelp数据集报错 #1264
Unanswered
kuzma-long
asked this question in
Q&A
Yelp数据集报错
#1264
Replies: 2 comments 1 reply
-
@kuzma-long 您好,请检查数据集的格式是否存在错误,即文件的格式是否符合 原子文件 。推荐你使用我们提供的已经处理好的yelp数据集,在此数据格式下,sequential模型示例配置文件为:
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
配置:
General Hyper Parameters:
gpu_id = 3
use_gpu = True
seed = 2020
state = INFO
reproducibility = True
data_path = dataset/yelp
checkpoint_dir = saved
show_progress = True
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False
Training Hyper Parameters:
epochs = 300
train_batch_size = 128
learner = adam
learning_rate = 0.001
neg_sampling = None
eval_step = 1
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4
Evaluation Hyper Parameters:
eval_args = {'split': {'LS': 'valid_and_test'}, 'order': 'TO', 'group_by': 'user', 'mode': 'full'}
repeatable = True
metrics = ['Recall', 'NDCG', 'MRR']
topk = [5, 10, 20, 50]
valid_metric = NDCG@10
valid_metric_bigger = True
eval_batch_size = 256
metric_decimal_place = 4
Dataset Hyper Parameters:
field_separator =
seq_separator =
USER_ID_FIELD = user_id
ITEM_ID_FIELD = business_id
RATING_FIELD = stars
TIME_FIELD = timestamp
seq_len = None
LABEL_FIELD = label
threshold = None
NEG_PREFIX = neg_
load_col = None
unload_col = None
unused_col = None
additional_feat_suffix = None
rm_dup_inter = None
val_interval = {'stars': '[3,inf)'}
filter_inter_by_user_or_item = True
user_inter_num_interval = [15,inf)
item_inter_num_interval = [15,inf)
alias_of_user_id = None
alias_of_item_id = None
alias_of_entity_id = None
alias_of_relation_id = None
preload_weight = None
normalize_field = None
normalize_all = None
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 50
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id
benchmark_filename = None
Other Hyper Parameters:
wandb_project = recbole
require_pow = False
MODEL_TYPE = ModelType.SEQUENTIAL
n_layers = [2, 4, 6, 8]
hidden_size = [32, 64]
hidden_dropout_prob = [0.2, 0.4, 0.6, 0.8]
hidden_act = gelu
layer_norm_eps = 1e-12
initializer_range = 0.02
loss_type = CE
pooling_mode = mean
hyper_parameters = ['n_layers', 'hidden_size', 'hidden_dropout_prob']
reg_weight = 0.0001
ssl_temp = 0.05
ssl_reg = 1e-06
alpha = 1.5
proto_reg = 1e-07
num_clusters = 2000
MODEL_INPUT_TYPE = InputType.POINTWISE
eval_type = EvaluatorType.RANKING
device = cuda
train_neg_sample_args = {'strategy': 'none'}
eval_neg_sample_args = {'strategy': 'full', 'distribution': 'uniform'}
报错:
Traceback (most recent call last):
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 760, in _next_iter_line
line = next(self.data)
_csv.Error: ' ' expected after '"'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 125, in
run_single_model(args)
File "main.py", line 49, in run_single_model
dataset = create_dataset(config)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/utils.py", line 68, in create_dataset
dataset = dataset_class(config)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/sequential_dataset.py", line 36, in init
super().init(config)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/dataset.py", line 96, in init
self._from_scratch()
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/dataset.py", line 106, in _from_scratch
self._load_data(self.dataset_name, self.dataset_path)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/dataset.py", line 249, in _load_data
self.item_feat = self._load_user_or_item_feat(token, dataset_path, FeatureSource.ITEM, 'iid_field')
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/dataset.py", line 311, in _load_user_or_item_feat
feat = self._load_feat(feat_path, source)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/dataset.py", line 438, in _load_feat
df = pd.read_csv(
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 581, in _read
return parser.read(nrows)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1254, in read
index, columns, col_dict = self._engine.read(nrows)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 238, in read
content = self._get_lines(rows)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 1091, in _get_lines
new_row = self._next_iter_line(row_num=self.pos + rows + 1)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 789, in _next_iter_line
self._alert_malformed(msg, row_num)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 739, in _alert_malformed
raise ParserError(msg)
pandas.errors.ParserError: ' ' expected after '"'
在使用yelp数据集时,在数据预处理阶段报该错误,请问一下是哪里出了问题?
Beta Was this translation helpful? Give feedback.
All reactions