You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The metrics currently provided by polaris are decent for standard classification/regression tasks, but there are many problems that might require more sophisticated methods for quantifying performance.
As an example, I have a task where each sample is an array, and the labels is an array of the same length. So for each sample, each array position carries a label.
For quantifying performance, we want to measure how many positions we got right over the whole dataset, so I would do something like
# flatten (n, seq_len) to (n*seq_len)y_true=y_true.ravel()
y_pred=y_pred.ravel()
metric=matthews_corrcoef(y_true, y_pred)
Right now, as far as I can tell there is no way to specify such a thing. And I can imagine there are numerous cases that are much more involved than this simple flattening step.
Describe the solution you'd like
There's probably no way to make all of this happen in the polaris source. Submitting a PR each time and having ad hoc things like flattened_mcc in the library doesn't sound like a good idea. Could a way forward be to allow "metrics as code", implemented following a specified API, to be provided with a benchmark optionally in a .py file? Executing 3rd party code of course is dangerous. But it's doable, e.g. huggingface handles this by forcing the user to manually set trust_remote_code=True to use custom models from their hub.
Describe alternatives you've considered
Alteratively, just not allow such things in polaris, and commuicate somewhere that only tasks using the available metrics are supported.
The text was updated successfully, but these errors were encountered:
fteufel
added
the
feature
Annotates any PR that adds new features; Used in the release process
label
Jul 23, 2024
This is a great feature request! Something we've been thinking about for a while now and I think you did a great job at summarizing the possibilities we have here.
There's probably no way to make all of this happen in the polaris source.
You're right!
One connotation: I think we could still improve the metric system, such as by extending it with serializable and modular preprocessing steps. That way you wouldn't create flattened_mcc, but you would create a flatten action and a mcc action and then come up with a system to save a pipeline of such actions as a metric in a benchmark.
Ultimately, however, as the scope of Polaris grows there will always be niche, domain-specific or task-specific metrics that we likely cannot all include in the Polaris source code.
Could a way forward be to allow "metrics as code", implemented following a specified API, to be provided with a benchmark optionally in a .py file?
Yes, this is definitely an interesting possibility, but it's a challenging feature. For such challenging features, we would like to collect some user feedback before we start on implementing them to better understand the requirements. I like that you mention 🤗 ! Such an established product is a good source of inspiration! I'll look into that!
Is your feature request related to a problem? Please describe.
The metrics currently provided by polaris are decent for standard classification/regression tasks, but there are many problems that might require more sophisticated methods for quantifying performance.
As an example, I have a task where each sample is an array, and the labels is an array of the same length. So for each sample, each array position carries a label.
For quantifying performance, we want to measure how many positions we got right over the whole dataset, so I would do something like
Right now, as far as I can tell there is no way to specify such a thing. And I can imagine there are numerous cases that are much more involved than this simple flattening step.
Describe the solution you'd like
There's probably no way to make all of this happen in the polaris source. Submitting a PR each time and having ad hoc things like
flattened_mcc
in the library doesn't sound like a good idea. Could a way forward be to allow "metrics as code", implemented following a specified API, to be provided with a benchmark optionally in a .py file? Executing 3rd party code of course is dangerous. But it's doable, e.g. huggingface handles this by forcing the user to manually settrust_remote_code=True
to use custom models from their hub.Describe alternatives you've considered
Alteratively, just not allow such things in polaris, and commuicate somewhere that only tasks using the available metrics are supported.
The text was updated successfully, but these errors were encountered: