You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(maybe this can be documentation rather than an issue)
How gradients of batches are calculated
In the general optimizer there is a function optimizer_calc_batch_gradient() that calculates the gradient of the batch. The trick used here is to have a integer index array, called pivot own to abstract optimizer, that keeps track of the shuffled order of the samples. When calculating the batch gradient the gradient for the samples are calculated separately (one by one) and the mean of all samples is returned as the batch gradient. To speed things up, the batch gradient is threaded using OpenMP. Works really good as the batch samples are selected from the shuffled index array and the sample data (input vector and target vector) are never moved around in memory. Short said: The gradient of a batch is calculated sample-by-sample in a threaded manner.
Alternative solution
An alternative would be to use the BLAS libraries more extensively and calculate the batch gradient with matrix operations. This will of course require an new implementation of backpropagation with the batch_size as a parameter. (The current backpropagation function can only take one sample/target and calculate the gradient.) The drawback of this approach is that we then need to reconstruct the samples from the index array, and this will involve a lot of memory copying which can be a performance killer. Then there might be an other alternative to actually do the memory shuffling of the training dataset in the shuffle() function. Not sure what is most efficient.
Another consideration
The threading used in the batch gradient calculation is based on OpenMP. This might come in conflict with other threading technologies. Could it be an idea to use classic pthreads instead of OpenMP?
The text was updated successfully, but these errors were encountered:
I will guess that shuffling the dataset in memory is pretty fast - and it happens only once each epoch. But then - if I instead of shuffling the input data between each epoch, I can use the pivot to copy batch size samples - won't that be about the same thing?
(maybe this can be documentation rather than an issue)
How gradients of batches are calculated
In the general optimizer there is a function
optimizer_calc_batch_gradient()
that calculates the gradient of the batch. The trick used here is to have a integer index array, calledpivot
own to abstract optimizer, that keeps track of the shuffled order of the samples. When calculating the batch gradient the gradient for the samples are calculated separately (one by one) and the mean of all samples is returned as the batch gradient. To speed things up, the batch gradient is threaded using OpenMP. Works really good as the batch samples are selected from the shuffled index array and the sample data (input vector and target vector) are never moved around in memory. Short said: The gradient of a batch is calculated sample-by-sample in a threaded manner.Alternative solution
An alternative would be to use the BLAS libraries more extensively and calculate the batch gradient with matrix operations. This will of course require an new implementation of backpropagation with the
batch_size
as a parameter. (The current backpropagation function can only take one sample/target and calculate the gradient.) The drawback of this approach is that we then need to reconstruct the samples from the index array, and this will involve a lot of memory copying which can be a performance killer. Then there might be an other alternative to actually do the memory shuffling of the training dataset in theshuffle()
function. Not sure what is most efficient.Another consideration
The threading used in the batch gradient calculation is based on OpenMP. This might come in conflict with other threading technologies. Could it be an idea to use classic pthreads instead of OpenMP?
The text was updated successfully, but these errors were encountered: