You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The design of your adjacency matrix adj_mats_orig and the way you split the train/test set will cause a huge data leakage problem in your training, because your validation and test set is created independently for gene_adj and gene_adj.transpose(copy=True), and therefore the edges from the validation / test set in gene_adj is actually included in the training set of gene_adj.transpose(copy=True).
Same problem goes for the train / validate set between gene_drug_adj and drug_gene_adj. The validation edges from gene_drug_adj are actually used for training in drug_gene_adj, and vise versa.
Hello @hurleyLi , I have the same problem as you at first, but now I think this is not a big problem because what we want to predict is between drug nodes, which means p-p and p-d edge doesn't matter
The design of your adjacency matrix
adj_mats_orig
and the way you split the train/test set will cause a huge data leakage problem in your training, because your validation and test set is created independently forgene_adj
andgene_adj.transpose(copy=True)
, and therefore the edges from the validation / test set ingene_adj
is actually included in the training set ofgene_adj.transpose(copy=True)
.Same problem goes for the train / validate set between
gene_drug_adj
anddrug_gene_adj
. The validation edges fromgene_drug_adj
are actually used for training indrug_gene_adj
, and vise versa.Could you please clarify?
Thanks!
Originally posted by @hurleyLi in #7 (comment)
The text was updated successfully, but these errors were encountered: