You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For scientific python (and perhaps computer vision as well), a very common application is linear algebra. The simplest objects of interest there are vectors, which are easy to generate following the hypothesis documentation.
The second most common object of interest are perhaps rotation matrices and covariance matrices. Covariance matrices are positive semi-definite matrices, so simply generating a matrix and then checking whether it is valid is not an efficient strategy. Searching "hypothesis covariance matrix" brings up literature on a quite different topic.
I had this problem recently, so I thought I would share a strategy to generate covariance matrices.
Strategy
The strategy is not surprising to those familiar with linear algebra:
generate eigenvalues and eigenvectors. Eigenvectors need to be orthogonal, which can be achieved with the Gram-Schmidt process (QR-factorization).
build the covariance matrix (eigvec @ eigval @ eigvec.T). A affine transformation matrix instead could be built with T = eigvec * eigval**-0.5. The q from QR factorizatoin are a rotation matrix.
Nevertheless, for numerical reasons, this can still rarely produce matrices that cannot be inverted. So finally, in the test using it I verify that the matrix is valid.
I am a beginner in hypothesis, so probably this can be written much better. For example, I don't understand how I can chain a strategy, generating ndim first, and then passing that into mean_and_cov?
In mean_and_cov, there are range constraints hard-coded that may need to be adjusted depending on the application. These could perhaps be parameters of the strategy.
Proposal
Perhaps this can be incorporated into hypothesis.extra.numpy.
Alternatives
An noteworthy alternative is to generate covariance matrices from a Wishart distribution:
This is clearly very useful for people working in the relevant domains; I'm just not sure whether Hypothesis itself is the best place to put it, or whether a third-party extension might be better for maintainence (hypothesis-linalg? or as part of another scientific computing package, alaxarray.testing?). That's mostly because it seems unlikely that covariance matrices are the only such special arrays that we'd want to generate, but adding all of them to the general hypothesis.extra.numpy (or ...arrays) namespace would get quite crowded.
Random technical notes:
Check out the implementation of the arrays() strategy to see how the shapes and elements arguments are implemented. Also note however that accepting "value or strategy" is against our API style guide, and only allowed for arrays() for backwards-compatibility reasons.
I'd recommend implementing this against the array-api strategies rather than numpy strategies, for flexibility
You can apply valid_covariance_matrix() as a filter, or better yet as an assume() call inside the mean_and_cov() strategy. The @st.composite decorator, or .flatmap() method, make it easy to 'chain' strategies together.
Shrinking is a pretty important feature for most users, and so it's usually worth going to a fair bit more trouble to make this work well. It's not only useful when shrinking a failing test either; the same principles which make the correspondence between underlying choices and high-level data and behavior also make trying out variations more effective in e.g. coverage-guided fuzzing.
I am afraid I am overloaded already by maintaining some dozen projects on pypi, so I don't think I can start a third-party extension project at this point. I am not sure I understood how to implement the random technical notes yet, it will take me some time. But I wanted to say thank you for taking the time and care to respond.
Motivation
For scientific python (and perhaps computer vision as well), a very common application is linear algebra. The simplest objects of interest there are vectors, which are easy to generate following the hypothesis documentation.
The second most common object of interest are perhaps rotation matrices and covariance matrices. Covariance matrices are positive semi-definite matrices, so simply generating a matrix and then checking whether it is valid is not an efficient strategy. Searching "hypothesis covariance matrix" brings up literature on a quite different topic.
I had this problem recently, so I thought I would share a strategy to generate covariance matrices.
Strategy
The strategy is not surprising to those familiar with linear algebra:
eigvec @ eigval @ eigvec.T
). A affine transformation matrix instead could be built withT = eigvec * eigval**-0.5
. Theq
from QR factorizatoin are a rotation matrix.Limitations
I am a beginner in hypothesis, so probably this can be written much better. For example, I don't understand how I can chain a strategy, generating ndim first, and then passing that into
mean_and_cov
?In
mean_and_cov
, there are range constraints hard-coded that may need to be adjusted depending on the application. These could perhaps be parameters of the strategy.Proposal
Perhaps this can be incorporated into
hypothesis.extra.numpy
.Alternatives
An noteworthy alternative is to generate covariance matrices from a Wishart distribution:
However, the drawback here is that hypothesis will have a hard time shrinking to similar simpler examples.
The text was updated successfully, but these errors were encountered: