Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM when initializing model #5

Open
viktoriaschuster opened this issue Jun 25, 2024 · 2 comments
Open

OOM when initializing model #5

viktoriaschuster opened this issue Jun 25, 2024 · 2 comments

Comments

@viktoriaschuster
Copy link

Dear Dr. Zhang,

I am trying to run your model for a benchmark on paired multi-omics data. Both on my own and your example data I am running into out-of-memory issues when initializing the model. I have two GPUs with 24GB memory each available.

The error occurs in the initialization of TranslateAE (script train_model.py line 338 self.sess.run(tf.global_variables_initializer());).

This is the error message:
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[93283,15172] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node translator_yx_px_r_genebatch/kernel/Adam_1/Assign (defined at bin/train_model_edited.py:348) ]].

The model seems to create a peak-by-gene matrix for the translator. Is this a desired behavior or might I have missed something in your data preprocessing before running the model?

Kind regards,
Viktoria

@RanZhang08
Copy link
Contributor

Dear Viktoria,

The code can run on our local machines with 8G GPU memory and a batch size 16. Please double-check if there are caches or other running processes that are exhausting the memory.

The translator connects the RNA and ATAC embedding space (i.e., [embed_dim_x, embed_dim_y]) instead of the original feature space. Could you please check if the embed_dim_x and embed_dim_y arguments are passed correctly?

Please let us know if you have any further questions!

Best,
Ran

@viktoriaschuster
Copy link
Author

Dear Ran,

When I try to run the model no other processes are using up GPU memory.

I have used default parameters except for the embedding dimensions where I match with the benchmarked models
embed_dim_x = 20 embed_dim_y = 20

Best,
Viktoria

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants