Skip to content

Commit

Permalink
Merge branch 'main' into samtools_fastq
Browse files Browse the repository at this point in the history
  • Loading branch information
rcannood authored May 23, 2024
2 parents 51e3953 + 0370b4c commit 8e9661a
Show file tree
Hide file tree
Showing 14 changed files with 615 additions and 5 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,13 @@
- `samtools/samtools_sort`: Sort SAM/BAM/CRAM files (PR #36).
- `samtools/samtools_stats`: Reports alignment summary statistics for a BAM file (PR #39).
- `samtools/samtools_faidx`: Indexes FASTA files to enable random access to fasta and fastq files (PR #41).
- `samtools/samtools_collate`: Shuffles and groups reads in SAM/BAM/CRAM files together by their names (PR #42).
- `samtools/samtools_view`: Views and converts SAM/BAM/CRAM files (PR #48).
- `samtools/samtools_fastq`: Converts a SAM/BAM/CRAM file to FASTQ (PR #52).
- `samtools_collate`: Shuffles and groups reads in SAM/BAM/CRAM files together by their names (PR #42).

* `falco`: A C++ drop-in replacement of FastQC to assess the quality of sequence read data (PR #43).


## MAJOR CHANGES

## MINOR CHANGES
Expand Down
2 changes: 1 addition & 1 deletion src/samtools/samtools_idxstats/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ links:
documentation: https://www.htslib.org/doc/samtools-idxstats.html
repository: https://github.com/samtools/samtools
references:
doi: 10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
license: MIT/Expat

argument_groups:
Expand Down
2 changes: 1 addition & 1 deletion src/samtools/samtools_sort/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Sort SAM/BAM/CRAM file.
keywords: [sort, bam, sam, cram]
links:
homepage: https://www.htslib.org/
documentation: https://www.htslib.org/doc/samtools-idxstats.html
documentation: https://www.htslib.org/doc/samtools-sort.html
repository: https://github.com/samtools/samtools
references:
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
Expand Down
4 changes: 2 additions & 2 deletions src/samtools/samtools_stats/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ description: Reports alignment summary statistics for a BAM file.
keywords: [statistics, counts, bam, sam, cram]
links:
homepage: https://www.htslib.org/
documentation: https://www.htslib.org/doc/samtools-idxstats.html
documentation: https://www.htslib.org/doc/samtools-stats.html
repository: https://github.com/samtools/samtools
references:
doi: 10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
license: MIT/Expat

argument_groups:
Expand Down
351 changes: 351 additions & 0 deletions src/samtools/samtools_view/config.vsh.yaml

Large diffs are not rendered by default.

80 changes: 80 additions & 0 deletions src/samtools/samtools_view/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
```
samtools view
```

Usage: samtools view [options] <in.bam>|<in.sam>|<in.cram> [region ...]

Output options:
-b, --bam Output BAM
-C, --cram Output CRAM (requires -T)
-1, --fast Use fast BAM compression (and default to --bam)
-u, --uncompressed Uncompressed BAM output (and default to --bam)
-h, --with-header Include header in SAM output
-H, --header-only Print SAM header only (no alignments)
--no-header Print SAM alignment records only [default]
-c, --count Print only the count of matching records
-o, --output FILE Write output to FILE [standard output]
-U, --unoutput FILE, --output-unselected FILE
Output reads not selected by filters to FILE
-p, --unmap Set flag to UNMAP on reads not selected
then write to output file.
-P, --fetch-pairs Retrieve complete pairs even when outside of region
Input options:
-t, --fai-reference FILE FILE listing reference names and lengths
-M, --use-index Use index and multi-region iterator for regions
--region[s]-file FILE Use index to include only reads overlapping FILE
-X, --customized-index Expect extra index file argument after <in.bam>

Filtering options (Only include in output reads that...):
-L, --target[s]-file FILE ...overlap (BED) regions in FILE
-N, --qname-file [^]FILE ...whose read name is listed in FILE ("^" negates)
-r, --read-group STR ...are in read group STR
-R, --read-group-file [^]FILE
...are in a read group listed in FILE
-d, --tag STR1[:STR2] ...have a tag STR1 (with associated value STR2)
-D, --tag-file STR:FILE ...have a tag STR whose value is listed in FILE
-q, --min-MQ INT ...have mapping quality >= INT
-l, --library STR ...are in library STR
-m, --min-qlen INT ...cover >= INT query bases (as measured via CIGAR)
-e, --expr STR ...match the filter expression STR
-f, --require-flags FLAG ...have all of the FLAGs present
-F, --excl[ude]-flags FLAG ...have none of the FLAGs present
--rf, --incl-flags, --include-flags FLAG
...have some of the FLAGs present
-G FLAG EXCLUDE reads with all of the FLAGs present
--subsample FLOAT Keep only FLOAT fraction of templates/read pairs
--subsample-seed INT Influence WHICH reads are kept in subsampling [0]
-s INT.FRAC Same as --subsample 0.FRAC --subsample-seed INT

Processing options:
--add-flags FLAG Add FLAGs to reads
--remove-flags FLAG Remove FLAGs from reads
-x, --remove-tag STR
Comma-separated read tags to strip (repeatable) [null]
--keep-tag STR
Comma-separated read tags to preserve (repeatable) [null].
Equivalent to "-x ^STR"
-B, --remove-B Collapse the backward CIGAR operation
-z, --sanitize FLAGS Perform sanitity checking and fixing on records.
FLAGS is comma separated (see manual). [off]

General options:
-?, --help Print long help, including note about region specification
-S Ignored (input format is auto-detected)
--no-PG Do not add a PG line
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
-T, --reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
--write-index
Automatically index the output files [off]
--verbosity INT
Set level of verbosity
71 changes: 71 additions & 0 deletions src/samtools/samtools_view/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/bin/bash

## VIASH START
## VIASH END

set -e

[[ "$par_bam" == "false" ]] && unset par_bam
[[ "$par_cram" == "false" ]] && unset par_cram
[[ "$par_fast" == "false" ]] && unset par_fast
[[ "$par_uncompressed" == "false" ]] && unset par_uncompressed
[[ "$par_with_header" == "false" ]] && unset par_with_header
[[ "$par_header_only" == "false" ]] && unset par_header_only
[[ "$par_no_header" == "false" ]] && unset par_no_header
[[ "$par_count" == "false" ]] && unset par_count
[[ "$par_unmap" == "false" ]] && unset par_unmap
[[ "$par_use_index" == "false" ]] && unset par_use_index
[[ "$par_fetch_pairs" == "false" ]] && unset par_fetch_pairs
[[ "$par_customized_index" == "false" ]] && unset par_customized_index
[[ "$par_no_PG" == "false" ]] && unset par_no_PG
[[ "$par_write_index" == "false" ]] && unset par_write_index
[[ "$par_remove_B" == "false" ]] && unset par_remove_B

samtools view \
${par_bam:+-b} \
${par_cram:+-C} \
${par_fast:+--fast} \
${par_uncompressed:+-u} \
${par_with_header:+--with-header} \
${par_header_only:+-H} \
${par_no_header:+--no-header} \
${par_count:+-c} \
${par_output:+-o "$par_output"} \
${par_output_unselected:+-U "$par_output_unselected"} \
${par_unmap:+-p "$par_unmap"} \
${par_fetch_pairs:+-P "$par_fetch_pairs"} \
${par_fai_reference:+-t "$par_fai_reference"} \
${par_use_index:+-M "$par_use_index"} \
${par_region_file:+--region-file "$par_region_file"} \
${par_customized_index:+-X} \
${par_target_file:+-L "$par_target_file"} \
${par_qname_file:+-N "$par_qname_file"} \
${par_read_group:+-r "$par_read_group"} \
${par_read_group_file:+-R "$par_read_group_file"} \
${par_tag:+-d "$par_tag"} \
${par_tag_file:+-D "$par_tag_file"} \
${par_min_MQ:+-q "$par_min_MQ"} \
${par_library:+-l "$par_library"} \
${par_min_qlen:+-m "$par_min_qlen"} \
${par_expr:+-e "$par_expr"} \
${par_require_flags:+-f "$par_require_flags"} \
${par_excl_flags:+-F "$par_excl_flags"} \
${par_incl_flags:+--rf "$par_incl_flags"} \
${par_excl_all_flags:+-G "$par_excl_all_flags"} \
${par_subsample:+--subsample "$par_subsample"} \
${par_subsample_seed:+--subsample-seed "$par_subsample_seed"} \
${par_add_flags:+--add-flags "$par_add_flags"} \
${par_remove_flags:+--remove-flags "$par_remove_flags"} \
${par_remove_tag:+-x "$par_remove_tag"} \
${par_keep_tag:+--keep-tag "$par_keep_tag"} \
${par_remove_B:+-B} \
${par_sanitize:+-z "$par_sanitize"} \
${par_input_fmt_option:+--input-fmt-option "$par_input_fmt_option"} \
${par_output_fmt:+-O "$par_output_fmt"} \
${par_output_fmt_option:+--output-fmt-option "$par_output_fmt_option"} \
${par_reference:+-T "$par_reference"} \
${par_write_index:+--write-index} \
${par_no_PG:+--no-PG} \
"$par_input"

exit 0
87 changes: 87 additions & 0 deletions src/samtools/samtools_view/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#!/bin/bash

test_dir="${meta_resources_dir}/test_data"
temp_dir="${meta_resources_dir}/out"

############################################################################################

echo ">>> Test 1: Import SAM to BAM when @SQ lines are present in the header"
"$meta_executable" \
--bam \
--output "$temp_dir/a.bam" \
--input "$test_dir/a.sam"

echo ">>> Checking whether output exists"
[ ! -f "$temp_dir/a.bam" ] && echo "File 'a.bam' does not exist!" && exit 1

echo ">>> Checking whether output is non-empty"
[ ! -s "$temp_dir/a.bam" ] && echo "File 'a.bam' is empty!" && exit 1

echo ">>> Checking whether output is correct"
# compare output of "samtools view" for both files
diff <(samtools view "$temp_dir/a.bam") <(samtools view "$test_dir/a.bam") || \
(echo "Output file a.bam does not match expected output" && exit 1)

############################################################################################

echo ">>> Test 2: ${meta_functionality_name} with CRAM format output"

"$meta_executable" \
--cram \
--output "$temp_dir/a.cram" \
--input "$test_dir/a.sam"

echo ">>> Checking whether output exists"
[ ! -f "$temp_dir/a.cram" ] && echo "File 'a.cram' does not exist!" && exit 1

echo ">>> Checking whether output is non-empty"
[ ! -s "$temp_dir/a.cram" ] && echo "File 'a.cram' is empty!" && exit 1

echo ">>> Checking whether output is correct"
# compare output of "samtools view" for both files
diff <(samtools view "$temp_dir/a.cram") <(samtools view "$test_dir/a.cram") || \
(echo "Output file a.cram does not match expected output" && exit 1)

############################################################################################

echo ">>> Test 3: ${meta_functionality_name} with --count option"

"$meta_executable" \
--count \
--output "$temp_dir/a.count" \
--input "$test_dir/a.sam"

echo ">>> Checking whether output exists"
[ ! -f "$temp_dir/a.count" ] && echo "File 'a.count' does not exist!" && exit 1

echo ">>> Checking whether output is non-empty"
[ ! -s "$temp_dir/a.count" ] && echo "File 'a.count' is empty!" && exit 1

echo ">>> Checking whether output is correct"
diff "$temp_dir/a.count" "$test_dir/a.count" || \
(echo "Output file a.count does not match expected output" && exit 1)

############################################################################################

echo ">>> Test 4: ${meta_functionality_name} including only the forward reads from read pairs"

"$meta_executable" \
--output "$temp_dir/a.forward" \
--excl_flags "0x80" \
--input "$test_dir/a.sam"

echo ">>> Checking whether output exists"
[ ! -f "$temp_dir/a.forward" ] && echo "File 'a.forward' does not exist!" && exit 1

echo ">>> Checking whether output is non-empty"
[ ! -s "$temp_dir/a.forward" ] && echo "File 'a.forward' is empty!" && exit 1

echo ">>> Checking whether output is correct"
diff "$temp_dir/a.forward" "$test_dir/a.forward" || \
(echo "Output file a.forward does not match expected output" && exit 1)

############################################################################################

echo ">>> All test passed successfully"
rm -rf "${temp_dir}"
exit 0
Binary file added src/samtools/samtools_view/test_data/a.bam
Binary file not shown.
1 change: 1 addition & 0 deletions src/samtools/samtools_view/test_data/a.count
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
6
Binary file added src/samtools/samtools_view/test_data/a.cram
Binary file not shown.
3 changes: 3 additions & 0 deletions src/samtools/samtools_view/test_data/a.forward
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
a1 99 xx 1 1 10M = 11 20 AAAAAAAAAA **********
b1 99 xx 1 1 10M = 11 20 AAAAAAAAAA **********
c1 99 xx 1 1 10M = 11 20 AAAAAAAAAA **********
7 changes: 7 additions & 0 deletions src/samtools/samtools_view/test_data/a.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
@SQ SN:xx LN:20
a1 99 xx 1 1 10M = 11 20 AAAAAAAAAA **********
b1 99 xx 1 1 10M = 11 20 AAAAAAAAAA **********
c1 99 xx 1 1 10M = 11 20 AAAAAAAAAA **********
a1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT **********
b1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT **********
c1 147 xx 11 1 10M = 1 -20 TTTTTTTTTT **********
8 changes: 8 additions & 0 deletions src/samtools/samtools_view/test_data/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

# dowload test data from snakemake wrapper
if [ ! -d /tmp/view_source ]; then
git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers.git /tmp/view_source
fi

cp -r /tmp/idxstats_source/bio/samtools/view/test/*.sam src/samtools/samtools_view/test_data

0 comments on commit 8e9661a

Please sign in to comment.