Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric for enformer #9

Open
Rachel66666 opened this issue Jul 27, 2022 · 6 comments
Open

metric for enformer #9

Rachel66666 opened this issue Jul 27, 2022 · 6 comments

Comments

@Rachel66666
Copy link

Hello, can I ask how you find of the human pearson R is 0.625 for validation, and 0.65 for test? Couldn't find any information in the paper. Is there any other place that records this?

@rnsherpa
Copy link

rnsherpa commented Apr 21, 2023

Sorry for reviving an old thread, but I'd also like to know where these correlation numbers come from with respect to the original paper. It looks like @jstjohn did the correlation analysis here. Would you be able to shed some light on the question?

@biginfor
Copy link

Let's assume the data are as follows:
batch1 : input1, target1
batch2 : input2, target2
batch3 : input3, target3
The idea behind calculating Pearson correlation coefficient using the original TensorFlow version of the enformer is as follows:
cor(c(input1,input2,input3),c(target1,target2,target3))
The idea behind calculating Pearson correlation coefficient using the original Pytorch version of the enformer is as follows:
mean(cor(input1,target1),cor(input2,target2),cor(input3,target3))
I think the second option is reasonable.

@jstjohn
Copy link
Contributor

jstjohn commented Sep 29, 2024 via email

@biginfor
Copy link

Sorry, I think you're right. Calculating the correlation based on batches and then taking the average is not a good idea as it ignores the global distribution of the data.
2e3d512f6e1e61a2916bf4584a93309

Why is the second better? Here each input and target is a different batch along the sequence axis. Taking the global correlation of points is a correlation metric while the mean of subsequence correlations (with arbitrary cut points even) is something else that needs more justification in my opinion. Sent from my iPhoneOn Sep 29, 2024, at 7:40 AM, Eli @.> wrote: Let's assume the data are as follows: batch1 : input1, target1 batch2 : input2, target2 batch3 : input3, target3 The idea behind calculating Pearson correlation coefficient using the original TensorFlow version of the enformer is as follows: cor(c(input1,input2,input3),c(target1,target2,target3)) The idea behind calculating Pearson correlation coefficient using the original Pytorch version of the enformer is as follows: mean(cor(input1,target1),cor(input2,target2),cor(input3,target3)) I think the second option is reasonable. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.>

@jstjohn
Copy link
Contributor

jstjohn commented Sep 30, 2024 via email

@biginfor
Copy link

Thanks! That helps a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants