Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing a 2.6GB Plant Genome Using STAR (genomeGenerate Mode) Terminated Due to RAM Limit — Can the Genome Be Split for Indexing, or Are There Other Solutions? #2235

Open
Vijithkumar2020 opened this issue Nov 5, 2024 · 0 comments

Comments

@Vijithkumar2020
Copy link

I am trying to index my plant genome that was de novo assembled, using the STAR aligner tool. The assembly file contains 2,976,459 contigs with N50 being 1,293kb.

The following command was used:

STAR --runThreadN 8 \
--runMode genomeGenerate \
--genomeDir /path/
--genomeFastaFiles /path/*.fa \
--genomeSAindexNbases 14 \
--genomeSAsparseD 2

And the error that was encountered was

EXITING because of FATAL PARAMETER ERROR: limitGenomeGenerateRAM=31000000000is too small for your genome

SOLUTION: please specify --limitGenomeGenerateRAM not less than 2080695648522 and make that much RAM available

System capabilities: CPU with 8 cores and 244GB RAM.

One of the suggestions I was given was that the contig counts be the potential culprit. I haven't checked for any duplicates in my assembly file. Since I don't have assembly data of any related species under the same genus, I doubt if scaffolding using tools like RagTag would be helpful. Therefore I am looking forward to suggestions as to how to perform indexing within my system capacity.

Note: my end-goal is to perform BRAKER (with RNA seq and protein data), and for the Stringtie2 to work, the aligned reads need to have XS tags. With STAR aligner, this is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant