fix: not erroring out when a non-ACGT base is in the query file #22
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Right now if there is even a single non-ACGT base in the query file,
COBS
will fail with:This is quite annoying as it is extremely common to have
N
s or other non-ACGT bases in assembly, contig, reads files, etc... I'd sayCOBS
query would actually fail more than succeed on random real-data query files.This PR is a first approach to solve this issue. It simply replaces non-ACGT bases by A. This is what is done MOF-search (https://github.com/karel-brinda/mof-search) and it is an ok approach. Maybe a better approach would be to randomise this replacement character, or receive it as a parameter, or actually skip the kmers, etc, etc... there are many approaches to deal with this that can be discussed and implemented later, but I think it is important to have a first solution to this.