You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In ap.c the normal rand function of the c standard lib is called. In general this is a weak Linear congruential generator that is not thread save. With weak here I mean that it has a short period and the resulting numbers are correlated. Another issue is thread safety. This is a problem when the CAffinityPropagation function is called from separate python processes because we can't independently seed the RNG for each call of the function. I'm not sure how often the addition of noise is used in practice but I would recommend to remove the addition of noise in ap.c and rather do that in affinityprop.pyx with it's own numpy.random.RandomState. To allow using unique random states I would suggest the following code from scikit-learn.
# copy-pasted from scikit-learn utils/validation.pydefcheck_random_state(seed):
"""Turn seed into a np.random.RandomState instance If seed is None (or np.random), return the RandomState singleton used by np.random. If seed is an int, return a new RandomState instance seeded with seed. If seed is already a RandomState instance, return it. Otherwise raise ValueError. """ifseedisNoneorseedisnp.random:
returnnp.random.mtrand._randifisinstance(seed, (numbers.Integral, np.integer)):
returnnp.random.RandomState(seed)
ifisinstance(seed, np.random.RandomState):
returnseedraiseValueError('%r cannot be used to seed a numpy.random.RandomState'' instance'%seed)
The noise addtion in AffinityPropagation.run should then be
This will allow to pass a separate seed for each python process ensuring that the random numbers are not correlated. Additionally the numpy uses the Mersenne Twister, a widely accepted RNG for scientific applications.
@mtiberti if it doesn't matter that the added noise might be correlated it would be nice if you can still add a comment in the C code.
The text was updated successfully, but these errors were encountered:
Thanks for this. Since noise is added to remove degeneracies from the similarity matrix it shouldn't be a problem if it's slightly correlated - a potentially problematic case would be when the same noise is added to elements of the similarity matrix which are identical, which I think is unlikely. However since the change you're proposing is not very time-consuming and still adds to the robustness of the code, I'm going to implement it. The only problem is that I'm about to leave for holidays and I'll be back on the 26th of July and I'll unlikely be able to work on it before then
a temporary solution would be the ability to pass a different random seed to each independent process to ensure that they are uncorrelated, as suggested in #40. The quality of RNG is not very critical for this application
In ap.c the normal
rand
function of the c standard lib is called. In general this is a weak Linear congruential generator that is not thread save. With weak here I mean that it has a short period and the resulting numbers are correlated. Another issue is thread safety. This is a problem when theCAffinityPropagation
function is called from separate python processes because we can't independently seed the RNG for each call of the function. I'm not sure how often the addition of noise is used in practice but I would recommend to remove the addition of noise inap.c
and rather do that inaffinityprop.pyx
with it's ownnumpy.random.RandomState
. To allow using unique random states I would suggest the following code from scikit-learn.The noise addtion in
AffinityPropagation.run
should then beThis will allow to pass a separate seed for each python process ensuring that the random numbers are not correlated. Additionally the numpy uses the Mersenne Twister, a widely accepted RNG for scientific applications.
@mtiberti if it doesn't matter that the added noise might be correlated it would be nice if you can still add a comment in the C code.
The text was updated successfully, but these errors were encountered: