Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customized metrics #196

Open
zhu0619 opened this issue Sep 10, 2024 · 7 comments
Open

Customized metrics #196

zhu0619 opened this issue Sep 10, 2024 · 7 comments
Labels
feature Annotates any PR that adds new features; Used in the release process

Comments

@zhu0619
Copy link
Contributor

zhu0619 commented Sep 10, 2024

Is your feature request related to a problem? Please describe.

New metrics need to be implemented and hard coded in the Polaris client whenever new modality and benchmarks introduce them. This has been the bottleneck for the benchmark creation process.

Describe the solution you'd like

An approach is needed that allows flexible, customized metrics while maintaining the robustness of the Polaris client codebase. Balancing flexibility with stability is key to ensuring that users can easily introduce new metrics without compromising the integrity of the system.

@zhu0619 zhu0619 added the feature Annotates any PR that adds new features; Used in the release process label Sep 10, 2024
@wconnell
Copy link

wconnell commented Oct 16, 2024

Hello, I'm working on the EvoScale BioML hackathon so I'm under time constraints...

I have a custom metric that doesn't fit into the two-column specification of your design. it would be great to connect with you to help out with a solution for our submission. Thanks.

@cwognum
Copy link
Collaborator

cwognum commented Oct 16, 2024

Hi @wconnell , thanks for reaching out. Could you provide some more details on what you're trying to do? Custom metrics is a complex feature we won't be able to implement soon, but maybe I can help rethink the structure of your dataset / benchmark to a format that we can already support.

@wconnell
Copy link

wconnell commented Oct 16, 2024

hey, thanks for getting back to me @cwognum

we are uploading a new dataset called OpenPlasmid. our evaluation looks at how well different plasmid sequence embedding methods reflect the similarity of plasmid feature annotations. so, we basically take the plasmid embeddings of a new method, cluster them, and then compute NMI and ARI relative to labels to quantify expected similarity.

any ideas how this can fit into your framework?

@cwognum
Copy link
Collaborator

cwognum commented Oct 16, 2024

Oeh, that's interesting, but I'm afraid it's not a great fit for the current state of the platform... We've been focused on predictive modeling.

However, it sounds like you could ask people to submit the cluster annotations and then compare that against the ground truth clustering.

So, for example, your dataset may look like:

Plasmid ID Cluster
0 0
1 0
2 1
3 1

The predictions would then e.g. look like: [1, 1, 0, 2].

So:

from sklearn.metrics.cluster import normalized_mutual_info_score as nmi
from sklearn.metrics import adjusted_rand_score as ari

nmi([0, 0, 1, 1], [1, 1, 0, 2])
# Gives: ~0.800

ari([0, 0, 1, 1], [1, 1, 0, 2])
# Gives: ~0.571

That does mean you don't have any control over the clustering algorithm. I understand that that may not be ideal, but you could make clear in your README how people are supposed to do the clustering and ask people to attach a link to the code when they submit results (Polaris has a dedicated field for this, see here).

@wconnell
Copy link

ok, yeah thats prob a sufficient work around for now. thanks for the suggestion!

@wconnell
Copy link

hey, figured I'd be back.. realizing that there is not a way to add new metrics for usage with SingleTaskBenchmarkSpecification? Doesnt seem like I can extend the Metric class

@cwognum
Copy link
Collaborator

cwognum commented Oct 16, 2024

Hey! You can, but it's a bit of a manual process. You'll have to create a PR in this repo. See e.g. #199 and #48 .

This is a bit of a frustrating process, which is why we created this issue to begin with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Annotates any PR that adds new features; Used in the release process
Projects
None yet
Development

No branches or pull requests

3 participants