-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up dot product of symmetric matrices #24
Comments
While we can safely skip the diagonal for the self dot-product (which contains ones due to the L2-normalization of the rows and the absolute similarity to self is not a real neighbour), reusing values from upper-diagonal matrix for calculating the lower-diagonal matrix IMO is not that easy because of top-N pruning of the neighbors. When calculating the similarity for (x, y) we can linearly scan the already calculated neighbors if they contain the similarity for (y, x) when x < y but even if not, we cannot assume it is zero or below-treshold and we need to calculate the similarity for (x, y) again. The reason is that maybe the similarity did not fit into top-N for row y but still may fit into top-N for row x. And the likelihood that we will have to calculate the similarity anyway is quite high. So in most cases we would just make things worse by scanning the symmetric row's neighbours. |
Update: We can do it differently - iterate over all rows in the outermost loop, iterate over portion of a row in the inner loop (and the slice length is shorter by one every row = upper triangular matrix) and we can store each calculated similarity to both symmetric positions (x, y) and (y, x). This way we don't have to deal with top N trimming of the neighbors and similarities. But this makes things more complicated when using multiple threads: While we could evenly divide the work among N threads by splitting the outermost loop (each thread processes its portion of rows), due to shortening of the "columns" in the inner loop the work would not be distributed evenly among the threads. What's worse: While originally we had embarassingly parallel algorithm with no data races among threads (all threads read the same matrix but each thread writes to own portion of the resulting neighbors and similarities), writing the symmetric similarity now breaks this feature and the threads now need to lock the resulting neighbors/similarities to maintain thread safety. The additional complexity may still pay off but maybe not that much. |
How about adding a set Update: Hmm, I realize this is a bad idea for sparse computation... Maybe you can iterate (x, y) diagonally like While iterating, save results to both (x, y) and (y, x). |
I am no expert in matrix computation.
For a symmetric matrix output, skipping computation of entries in the low-triangular area will speed up the computation by roughly a factor of two. This happens a lot when you are trying to find the best matches within the same set, aka self dot-product.
For example, at line 100 of https://github.com/ing-bank/sparse_dot_topn/blob/master/sparse_dot_topn/sparse_dot_topn_source.cpp, can we add a condition to check if
head < i
, with a flag check on an additional argumentsymmetric
?The text was updated successfully, but these errors were encountered: