Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partially skip data validation for artifacts downloaded from the Hub #175

Open
cwognum opened this issue Aug 13, 2024 · 0 comments
Open
Milestone

Comments

@cwognum
Copy link
Collaborator

cwognum commented Aug 13, 2024

Context

Polaris uses Pydantic as its data validation library. In addition to enforcing a type, we also use Pydantic validators to standardize data to a single format, to enforce constraints or to dynamically infer good defaults. For example:

  • Standardize data: If metrics are specified as str, we convert them to Metric objects.
  • Enforcing constraints: We validate that the train and test partition of a benchmark have no overlap.
  • Infer defaults: If the target types of a benchmark are not specified, we automatically infer them.

Description

These validations can get slow. See for example #148 and #154. Currently, we don't only validate the data model when an artifact is initially created, but also whenever we load an artifact from the Hub. Since we can assume that any data coming from the Hub is valid, we can also skip some of these checks when downloading the artifacts from the Hub to speed up the process.

However, we cannot simply disable data validation altogether. Of the three categories listed above, any validators that standardize data are still needed to deserialize the data that is sent by the Hub.

Acceptance Criteria

  • We can selectively skip data validation of time-consuming checks when downloading the data from the Hub.

Links

@cwognum cwognum added this to the XL Benchmarks milestone Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant