Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom/nonstandard metrics #152

Open
fteufel opened this issue Jul 23, 2024 · 1 comment
Open

Support for custom/nonstandard metrics #152

fteufel opened this issue Jul 23, 2024 · 1 comment
Labels
feature Annotates any PR that adds new features; Used in the release process

Comments

@fteufel
Copy link
Contributor

fteufel commented Jul 23, 2024

Is your feature request related to a problem? Please describe.

The metrics currently provided by polaris are decent for standard classification/regression tasks, but there are many problems that might require more sophisticated methods for quantifying performance.

As an example, I have a task where each sample is an array, and the labels is an array of the same length. So for each sample, each array position carries a label.

For quantifying performance, we want to measure how many positions we got right over the whole dataset, so I would do something like

# flatten (n, seq_len) to (n*seq_len)
y_true = y_true.ravel() 
y_pred = y_pred.ravel()
metric = matthews_corrcoef(y_true, y_pred)

Right now, as far as I can tell there is no way to specify such a thing. And I can imagine there are numerous cases that are much more involved than this simple flattening step.

Describe the solution you'd like

There's probably no way to make all of this happen in the polaris source. Submitting a PR each time and having ad hoc things like flattened_mcc in the library doesn't sound like a good idea. Could a way forward be to allow "metrics as code", implemented following a specified API, to be provided with a benchmark optionally in a .py file? Executing 3rd party code of course is dangerous. But it's doable, e.g. huggingface handles this by forcing the user to manually set trust_remote_code=True to use custom models from their hub.

Describe alternatives you've considered

Alteratively, just not allow such things in polaris, and commuicate somewhere that only tasks using the available metrics are supported.

@fteufel fteufel added the feature Annotates any PR that adds new features; Used in the release process label Jul 23, 2024
@cwognum
Copy link
Collaborator

cwognum commented Jul 23, 2024

This is a great feature request! Something we've been thinking about for a while now and I think you did a great job at summarizing the possibilities we have here.

There's probably no way to make all of this happen in the polaris source.

You're right!

One connotation: I think we could still improve the metric system, such as by extending it with serializable and modular preprocessing steps. That way you wouldn't create flattened_mcc, but you would create a flatten action and a mcc action and then come up with a system to save a pipeline of such actions as a metric in a benchmark.

Ultimately, however, as the scope of Polaris grows there will always be niche, domain-specific or task-specific metrics that we likely cannot all include in the Polaris source code.

Could a way forward be to allow "metrics as code", implemented following a specified API, to be provided with a benchmark optionally in a .py file?

Yes, this is definitely an interesting possibility, but it's a challenging feature. For such challenging features, we would like to collect some user feedback before we start on implementing them to better understand the requirements. I like that you mention 🤗 ! Such an established product is a good source of inspiration! I'll look into that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Annotates any PR that adds new features; Used in the release process
Projects
None yet
Development

No branches or pull requests

2 participants