metric for enformer #9

Rachel66666 · 2022-07-27T15:41:47Z

Hello, can I ask how you find of the human pearson R is 0.625 for validation, and 0.65 for test? Couldn't find any information in the paper. Is there any other place that records this?

rnsherpa · 2023-04-21T18:59:00Z

Sorry for reviving an old thread, but I'd also like to know where these correlation numbers come from with respect to the original paper. It looks like @jstjohn did the correlation analysis here. Would you be able to shed some light on the question?

biginfor · 2024-09-29T14:39:38Z

Let's assume the data are as follows:
batch1 : input1, target1
batch2 : input2, target2
batch3 : input3, target3
The idea behind calculating Pearson correlation coefficient using the original TensorFlow version of the enformer is as follows:
cor(c(input1,input2,input3),c(target1,target2,target3))
The idea behind calculating Pearson correlation coefficient using the original Pytorch version of the enformer is as follows:
mean(cor(input1,target1),cor(input2,target2),cor(input3,target3))
I think the second option is reasonable.

jstjohn · 2024-09-29T14:49:03Z

Why is the second better? Here each input and target is a different batch along the sequence axis. Taking the global correlation of points is a correlation metric while the mean of subsequence correlations (with arbitrary cut points even) is something else that needs more justification in my opinion. Sent from my iPhoneOn Sep 29, 2024, at 7:40 AM, Eli ***@***.***> wrote: Let's assume the data are as follows: batch1 : input1, target1 batch2 : input2, target2 batch3 : input3, target3 The idea behind calculating Pearson correlation coefficient using the original TensorFlow version of the enformer is as follows: cor(c(input1,input2,input3),c(target1,target2,target3)) The idea behind calculating Pearson correlation coefficient using the original Pytorch version of the enformer is as follows: mean(cor(input1,target1),cor(input2,target2),cor(input3,target3)) I think the second option is reasonable. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

biginfor · 2024-09-30T02:07:17Z

Sorry, I think you're right. Calculating the correlation based on batches and then taking the average is not a good idea as it ignores the global distribution of the data.

Why is the second better? Here each input and target is a different batch along the sequence axis. Taking the global correlation of points is a correlation metric while the mean of subsequence correlations (with arbitrary cut points even) is something else that needs more justification in my opinion. Sent from my iPhoneOn Sep 29, 2024, at 7:40 AM, Eli @.> wrote: Let's assume the data are as follows: batch1 : input1, target1 batch2 : input2, target2 batch3 : input3, target3 The idea behind calculating Pearson correlation coefficient using the original TensorFlow version of the enformer is as follows: cor(c(input1,input2,input3),c(target1,target2,target3)) The idea behind calculating Pearson correlation coefficient using the original Pytorch version of the enformer is as follows: mean(cor(input1,target1),cor(input2,target2),cor(input3,target3)) I think the second option is reasonable. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.>

jstjohn · 2024-09-30T03:17:27Z

Interesting. I mean the thing that seemed off was that your proposal was over arbitrary cut points. Taking the mean after splitting on a nuisance variable on the other hand could make a ton of sense. That could help control for confounding for example. You could cut the data up by something you don’t want to be included in your correlation measurement. Chromosome boundary for example, maybe GC percent bucket or some other feature you think is a nuisance variable that is not biologically meaningful. Then you could calculate correlation within each group and average. Smaller groups though might be noisier which would make real signals harder to detect. Just some other thoughts!Sent from my iPhoneOn Sep 29, 2024, at 7:07 PM, Eli ***@***.***> wrote: Sorry, I think you're right. Calculating the correlation based on batches and then taking the average is not a good idea as it ignores the global distribution of the data. 2e3d512f6e1e61a2916bf4584a93309.jpg (view on web) Why is the second better? Here each input and target is a different batch along the sequence axis. Taking the global correlation of points is a correlation metric while the mean of subsequence correlations (with arbitrary cut points even) is something else that needs more justification in my opinion. Sent from my iPhoneOn Sep 29, 2024, at 7:40 AM, Eli @.> wrote: Let's assume the data are as follows: batch1 : input1, target1 batch2 : input2, target2 batch3 : input3, target3 The idea behind calculating Pearson correlation coefficient using the original TensorFlow version of the enformer is as follows: cor(c(input1,input2,input3),c(target1,target2,target3)) The idea behind calculating Pearson correlation coefficient using the original Pytorch version of the enformer is as follows: mean(cor(input1,target1),cor(input2,target2),cor(input3,target3)) I think the second option is reasonable. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

biginfor · 2024-09-30T05:56:20Z

Thanks! That helps a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metric for enformer #9

metric for enformer #9

Rachel66666 commented Jul 27, 2022

rnsherpa commented Apr 21, 2023 •

edited

Loading

biginfor commented Sep 29, 2024

jstjohn commented Sep 29, 2024 via email

biginfor commented Sep 30, 2024

jstjohn commented Sep 30, 2024 via email

biginfor commented Sep 30, 2024

metric for enformer #9

metric for enformer #9

Comments

Rachel66666 commented Jul 27, 2022

rnsherpa commented Apr 21, 2023 • edited Loading

biginfor commented Sep 29, 2024

jstjohn commented Sep 29, 2024 via email

biginfor commented Sep 30, 2024

jstjohn commented Sep 30, 2024 via email

biginfor commented Sep 30, 2024

rnsherpa commented Apr 21, 2023 •

edited

Loading