data leakage problem in your model #9

hurleyLi · 2019-08-17T19:03:44Z

The design of your adjacency matrix adj_mats_orig and the way you split the train/test set will cause a huge data leakage problem in your training, because your validation and test set is created independently for gene_adj and gene_adj.transpose(copy=True), and therefore the edges from the validation / test set in gene_adj is actually included in the training set of gene_adj.transpose(copy=True).

Same problem goes for the train / validate set between gene_drug_adj and drug_gene_adj. The validation edges from gene_drug_adj are actually used for training in drug_gene_adj, and vise versa.

Could you please clarify?
Thanks!

Originally posted by @hurleyLi in #7 (comment)

The text was updated successfully, but these errors were encountered:

Fakak · 2021-01-12T07:35:45Z

Hello @hurleyLi , I have the same problem as you at first, but now I think this is not a big problem because what we want to predict is between drug nodes, which means p-p and p-d edge doesn't matter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data leakage problem in your model #9

data leakage problem in your model #9

hurleyLi commented Aug 17, 2019 •

edited

Loading

Fakak commented Jan 12, 2021

data leakage problem in your model #9

data leakage problem in your model #9

Comments

hurleyLi commented Aug 17, 2019 • edited Loading

Fakak commented Jan 12, 2021

hurleyLi commented Aug 17, 2019 •

edited

Loading