Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError when running testrun and on real data. #112

Open
patriciatran opened this issue Aug 13, 2024 · 2 comments
Open

UnicodeEncodeError when running testrun and on real data. #112

patriciatran opened this issue Aug 13, 2024 · 2 comments

Comments

@patriciatran
Copy link

Hi,

I installed checkm2 using the yml file, and downloaded the database without issue.
I get the following unicode encoding error in the testrun, but also when I try it on a small data (3 genomes) of my own real data.
Has anyone seen this error, or have advice on how to fix this?

Thank you,
Patricia

patricia@sulfur:~$ mamba env create -n checkm2 -f checkm2.yml 
patricia@sulfur:~$ conda activate checkm2
(checkm2) patricia@sulfur:~$ pip install CheckM2
(checkm2) patricia@sulfur:~$ checkm2 -h
		  ____ _               _    __  __ ____  
 		 / ___| |__   ___  ___| | _|  \/  |___ \ 
 		| |   | '_ \ / _ \/ __| |/ / |\/| | __) | 
 		| |___| | | |  __/ (__|   <| |  | |/ __/  
 		 \____|_| |_|\___|\___|_|\_\_|  |_|_____| 
 
                ...::: CheckM2 v1.0.1 :::...

  General usage:
    predict         -> Predict the completeness and contamination of genome bins in a folder.
    testrun         -> Runs Checkm2 on internal test genomes to ensure it runs without errors.
    database        -> Download and set up required CheckM2 DIAMOND database for annotation

  Use checkm2 <command> -h for command-specific help.
(checkm2) patricia@sulfur:~$ checkm2 database --download --path /storage1/data10/databases/checkm2/
[08/13/2024 12:26:42 PM] INFO: Command: Download database. Checking internal path information.
[08/13/2024 12:26:44 PM] INFO: Downloading https://zenodo.org/api/records/5571251/files/checkm2_database.tar.gz/content to /storage1/data10/databases/checkm2/checkm2_database.tar.gz.
100%|###################################################################################| 1.74G/1.74G [01:30<00:00, 19.2MiB/s]
[08/13/2024 12:28:15 PM] INFO: Extracting files from archive...
[08/13/2024 12:28:40 PM] INFO: Verifying version and checksums...
[08/13/2024 12:28:40 PM] INFO: Verification success.
[08/13/2024 12:28:48 PM] INFO: Diamond DATABASE downloaded successfully! Consider running <checkm2 testrun> to verify everything works.
(checkm2) patricia@sulfur:~$ checkm2 testrun
[08/13/2024 12:30:27 PM] INFO: Test run: Running quality prediction workflow on test genomes with 1 threads.
[08/13/2024 12:30:27 PM] INFO: Running checksum on test genomes.
[08/13/2024 12:30:27 PM] INFO: Checksum successful.
[08/13/2024 12:30:29 PM] INFO: Calling genes in 3 bins with 1 threads:
    Finished processing 3 of 3 (100.00%) bins.
[08/13/2024 12:30:58 PM] INFO: Calculating metadata for 3 bins with 1 threads:
    Finished processing 3 of 3 (100.00%) bin metadata.
[08/13/2024 12:30:59 PM] INFO: Annotating input genomes with DIAMOND using 1 threads
Traceback (most recent call last):
  File "/home/patricia/miniconda3/envs/checkm2/bin/checkm2", line 265, in <module>
    predictor.prediction_wf(False, 'auto', False, False, False)
  File "/home/patricia/miniconda3/envs/checkm2/lib/python3.8/site-packages/checkm2/predictQuality.py", line 135, in prediction_wf
    diamond_out = diamond_search.run(prodigal_files)
  File "/home/patricia/miniconda3/envs/checkm2/lib/python3.8/site-packages/checkm2/diamond.py", line 119, in run
    self.__call_diamond(protein_chunks, diamond_out)
  File "/home/patricia/miniconda3/envs/checkm2/lib/python3.8/site-packages/checkm2/diamond.py", line 74, in __call_diamond
    sequenceClasses.SeqReader().write_fasta(seq_object, temp_diamond_input.name)
  File "/home/patricia/miniconda3/envs/checkm2/lib/python3.8/site-packages/checkm2/sequenceClasses.py", line 104, in write_fasta
    fout.write('>' + seqId + '\n')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u03a9' in position 6: ordinal not in range(256)
@npbhavya
Copy link

npbhavya commented Aug 15, 2024

I am also running into the same error. I am running checkM2 v1.0.2

[08/15/2024 10:18:11 AM] INFO: Annotating input genomes with DIAMOND using 30 threads
Traceback (most recent call last):
  File "/home/nala0006/miniconda3/envs/checkm2/bin/checkm2", line 245, in <module>
    args.stdout, args.resume, args.remove_intermediates, args.ttable)
  File "/home/nala0006/miniconda3/envs/checkm2/lib/python3.6/site-packages/checkm2/predictQuality.py", line 135, in prediction_wf
    diamond_out = diamond_search.run(prodigal_files)
  File "/home/nala0006/miniconda3/envs/checkm2/lib/python3.6/site-packages/checkm2/diamond.py", line 119, in run
    self.__call_diamond(protein_chunks, diamond_out)
  File "/home/nala0006/miniconda3/envs/checkm2/lib/python3.6/site-packages/checkm2/diamond.py", line 74, in __call_diamond
    sequenceClasses.SeqReader().write_fasta(seq_object, temp_diamond_input.name)
  File "/home/nala0006/miniconda3/envs/checkm2/lib/python3.6/site-packages/checkm2/sequenceClasses.py", line 104, in write_fasta
    fout.write('>' + seqId + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\u03a9' in position 33: ordinal not in range(128)

@lyisrae1
Copy link

lyisrae1 commented Nov 8, 2024

Can anyone from Checkm2 help us please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants