Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider how to deal with backpropagation of batches #66

Open
oysteijo opened this issue Jan 2, 2024 · 1 comment
Open

Consider how to deal with backpropagation of batches #66

oysteijo opened this issue Jan 2, 2024 · 1 comment

Comments

@oysteijo
Copy link
Owner

oysteijo commented Jan 2, 2024

(maybe this can be documentation rather than an issue)

How gradients of batches are calculated

In the general optimizer there is a function optimizer_calc_batch_gradient() that calculates the gradient of the batch. The trick used here is to have a integer index array, called pivot own to abstract optimizer, that keeps track of the shuffled order of the samples. When calculating the batch gradient the gradient for the samples are calculated separately (one by one) and the mean of all samples is returned as the batch gradient. To speed things up, the batch gradient is threaded using OpenMP. Works really good as the batch samples are selected from the shuffled index array and the sample data (input vector and target vector) are never moved around in memory. Short said: The gradient of a batch is calculated sample-by-sample in a threaded manner.

Alternative solution

An alternative would be to use the BLAS libraries more extensively and calculate the batch gradient with matrix operations. This will of course require an new implementation of backpropagation with the batch_size as a parameter. (The current backpropagation function can only take one sample/target and calculate the gradient.) The drawback of this approach is that we then need to reconstruct the samples from the index array, and this will involve a lot of memory copying which can be a performance killer. Then there might be an other alternative to actually do the memory shuffling of the training dataset in the shuffle() function. Not sure what is most efficient.

Another consideration

The threading used in the batch gradient calculation is based on OpenMP. This might come in conflict with other threading technologies. Could it be an idea to use classic pthreads instead of OpenMP?

@oysteijo
Copy link
Owner Author

oysteijo commented Sep 4, 2024

I will guess that shuffling the dataset in memory is pretty fast - and it happens only once each epoch. But then - if I instead of shuffling the input data between each epoch, I can use the pivot to copy batch size samples - won't that be about the same thing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant