Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inability to Generate Perfect Reads #272

Open
Shruteek opened this issue Sep 18, 2024 · 2 comments
Open

Inability to Generate Perfect Reads #272

Shruteek opened this issue Sep 18, 2024 · 2 comments

Comments

@Shruteek
Copy link

Shruteek commented Sep 18, 2024

I am trying to generate perfect reads from a single genome .FASTA file, and whenever I do, the package attempts to generate the reads but kicks me back to the help menu. I am able to generate reads with '--mode basic' and with error models like '--model novaseq', but not with '--mode perfect'. I am using InSilicoSeq version 2.0.1. Is anybody else able to reproduce this?

Minimally reproducible example: download the attached .fasta file chromosome.zip, and run the following 2 commands:

conda create -n test insilicoseq -c bioconda
iss generate --genomes chromosome.fasta --mode perfect --n_reads 100 --compress --output chromosome_perfect_reads

Expected output:
2 files, one called chromosome_perfect_reads_R1.fastq, another called chromosome_perfect_reads_R2.fastq

Actual output:

(test) workflow$ iss generate --genomes chromosome.fasta --mode perfect --compress --n_reads 100 --output chromosome_perfect_reads
INFO:iss.app:Starting iss generate
INFO:iss.generator:Using perfect ErrorModel
INFO:iss.util:Stitching input files together
INFO:iss.generator:Using lognormal abundance distribution
INFO:iss.app:Using 2 cpus for read generation
INFO:iss.app:Generating 100 reads
usage: iss [subcommand] [options]

InSilicoSeq: A sequencing simulator

options:
  -h, --help     show this help message and exit
  -v, --version  print software version and exit

available subcommands:

    model        generate an error model from a bam file
    generate     simulate reads from an error model
@Shruteek
Copy link
Author

Shruteek commented Sep 18, 2024

Accidentally closed because I thought I resolved the issue. It turns out my placement of the '--compress' option was partially responsible; moving that flag to the end of the command, while testing, I have been able to generate reads several times, but I am still unable to reliably generate reads, and in most of the cases the program produces the same erroneous output.

@Shruteek Shruteek reopened this Sep 18, 2024
@Shruteek
Copy link
Author

Shruteek commented Sep 18, 2024

The core issue appears to be that there is no 'store_mutations' attribute in the PerfectErrorModel class (InSilicoSeq/iss/error_models/perfect.py), even though that attribute is expected in the main initialization function for Error Models (InSilicoSeq/iss/error_models/basic.py). As a result, there is a nonzero chance that the generator will attempt to mutate a read, in which case it will check 'store_mutations' and throw an error - if you only generate 5 or 10 reads, this may never happen, but at 100 or more, this will almost always cause an error, though there is a slight chance it will not.

Workaround: in InSilicoSeq/iss/error_models/perfect.py, add this line on line 20:
self.store_mutations = False

Alternatively, change line 14 to:
def __init__(self, fragment_length=None, fragment_sd=None, store_mutations=False):
and add this line on line 20:
self.store_mutations = store_mutations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant