Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Internal] [Bug] Table2asn error on test data user-provided annotations: duplicate Bioseq id #217

Open
1 task done
jessicarowell opened this issue Nov 13, 2024 · 0 comments
Labels
bug Something isn't working
Milestone

Comments

@jessicarowell
Copy link
Collaborator

What is the Bug Related To? Please Provide a Description. 
Table2asn fails on test bacteria data with user-provided annotations. These are the exact annotations created by bakta during a TOSTADAS run with annotation flag set.

Command error:
XML generated at ./BX248355/sra/submission.xml
XML generated at ./BX248355/genbank/submission.xml
Connected to FTP: ftp-private.ncbi.nlm.nih.gov
Changed directories to submit
Changed directories to Test
Changed directories to BX248355_biosample
Uploading text file: ./BX248355/biosample/submit.ready
Uploaded ./BX248355/biosample/submit.ready to submit.ready
Uploading binary file: ./BX248355/biosample/submission.xml
Uploaded ./BX248355/biosample/submission.xml to submission.xml
Submitted files for sample BX248355
Submitted sample BX248355 to BioSample
Connected to FTP: ftp-private.ncbi.nlm.nih.gov
Changed directories to submit
Changed directories to Test
Changed directories to BX248355_sra
Uploading text file: ./BX248355/sra/submit.ready
Uploaded ./BX248355/sra/submit.ready to submit.ready
Uploading binary file: ./BX248355/sra/submission.xml
Uploaded ./BX248355/sra/submission.xml to submission.xml
Uploading binary file: ./BX248355/sra/BX248355_R1.fq.gz
Uploaded ./BX248355/sra/BX248355_R1.fq.gz to BX248355_R1.fq.gz
Uploading binary file: ./BX248355/sra/BX248355_R2.fq.gz
Uploaded ./BX248355/sra/BX248355_R2.fq.gz to BX248355_R2.fq.gz
Submitted files for sample BX248355
Submitted sample BX248355 to SRA
Genbank files prepared for BX248355
Running table2asn...
table2asn command: table2asn -i ./BX248355/genbank/sequence.fsa -o ./BX248355/genbank/BX248355.sqn -t ./BX248355/genbank/authorset.sbt -f ./BX248355/genbank/BX248355.gff3 -locus-tag-prefix LOCUSTAG123 -M n -Z -w comment.cmt
Error running table2asn: This copy of table2asn is more than 1 year old. Please download the current version if it is newer.
Recognized annotation format: GFF3
Will be using one threads
Error:
Problem: duplicate Bioseq id
lcl|BX248355

Traceback (most recent call last):
File "/scicomp/home-pure/ick4/01.scripts/tostadas/bin/submission_new.py", line 1053, in
submission_main()
File "/scicomp/home-pure/ick4/01.scripts/tostadas/bin/submission_new.py", line 109, in submission_main
genbank_submission.prepare_files_ftp_submission() # Prep files and run table2asn
File "/scicomp/home-pure/ick4/01.scripts/tostadas/bin/submission_new.py", line 927, in prepare_files_ftp_submission
self.run_table2asn()
File "/scicomp/home-pure/ick4/01.scripts/tostadas/bin/submission_new.py", line 999, in run_table2asn
result = subprocess.run(cmd, check=True, capture_output=True, text=True)
File "/opt/conda/envs/tostadas/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['table2asn', '-i', './BX248355/genbank/sequence.fsa', '-o', './BX248355/genbank/BX248355.sqn', '-t', './BX248355/genbank/authorset.sbt', '-f', './BX248355/genbank/BX248355.gff3', '-locus-tag-prefix', 'LOCUSTAG123', '-M', 'n', '-Z', '-w', 'comment.cmt']' returned non-zero exit status 1.

Work dir:
/scicomp/scratch/ick4/d0/4cc11c801f5ce01e94229cc3dd7d10

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

Place an ❌ in a Box that Best Matches the Bug's Importance:

  • [] 1 - Most severe (a full-break in core function)
  • 2-4 - Moderate (break for a particular aspect/feature) (how integral is the broken feature?)
  • [] 5 - Least severe (non-functional issue, such as inconsitency / error in documentation or administrative in nature) 

Please Complete the Following Information:

  • OS: [e.g. iOS]: scicomp
  • Browser [e.g. chrome, safari]:
  • Version [e.g. 22]:
  • Run environment (container, cloud service, HPC, platform, etc.): scicomp HPC, but using a manual table2asn download (in my PATH at ~/bin/table2asn)

Please Outline Necessary Steps to Replicate Bug (Go to.. Click on... Install the following... etc.):
nextflow run main.nf -profile test,singularity --species bacteria --submission --annotation false --sra --biosample --genbank --output_dir test_bact_user --submission_config ~/02.scratch/submission_config.yaml

Any Additional Context or Information? Has There Been Any Progress Made So Far Towards this Request? Any Concrete Instructions to Resolve the Bug or Helpful Resources to Reference? Screenshots or Logs?

  1. Compared the actual table2asn command that's running to the run that runs when --annotation true (i.e. let bakta annotate same bacterial genomes). The commands are identical (in bold text above).

  2. Changed to the workdir and ran the table2asn command from there. It seems to freeze after printing 'Will be using one threads'. It never advices or times out. I ran it multiple times, for a few hours each time.

@jessicarowell jessicarowell added the bug Something isn't working label Nov 13, 2024
@jessicarowell jessicarowell added this to the v4.1.0 milestone Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant