Overview

FastSP is a Java program for computing alignment error (SP-FN) quickly and using little memory. It compared two alignments, and computes a bunch of metrics (see below).

Older versions are available here and a change log is available here. Dataset

Some test datasets are available here.

Usage

java -jar FastSP.jar -r reference_alignment_file -e estimated_alignment_file

FAQ

Is FastSP sensitive to case? Should I expect different results if I change the alignments from upper case to lower case or vice versa?

Answer:
- By default No. FastSP is by default not sensitive to case. In fact, it is not even sensitive to what characters you have in the alignment (and it doesn't need to). FastSP just cares about whether a certain position in the alignment is a residue or a gap. So, lower case letters are considered aligned as well as upper case case letters. Note that qscore is sensitive to case. qscore treats lower case letters as not aligned.
- You can add a -ml option to make FastSP sensitive to case. -ml instructs FastSP that it should ignore any homologies in the estimated alignment where one of both of the characters are lower case. Similarly, -mlr instructs FastSP that it should ignore any homologies in the reference alignment.
What do I do if I get a OutOFMemoryException?

A: By default Java limits the memory available to programs. If you run out of memory, try increasing the maximum memory available to jvm using the -Xmx option. For example, to make 2GB available to jvm use:

java -Xmx2048m -jar FastSP.jar -r reference_alignment_file -e estimated_alignment_file

2GB has been more than enough on the largest alignments we have looked at so far (with more than 1,000,000,000 cells.) However, increasing available memory, if you have more memory available, could make FastSP run faster.
What is the output?

A: Run FastSP with a -h option to see the output format. The main output is:
- SP-Score: number of shared homologies (aligned pairs) / total number of homologies in the reference alignment.
- Modeler: number of shared homologies (aligned pairs) / total number of homologies in the estimated alignment.
- SP-FN: 1 - SP-Score
- SP-FP: 1- Modeler
- TC: number of correctly aligned columns / total number of aligned columns.
- Compression Factor (naive): number of columns in the estimated alignment / number of columns in the reference alignment
- Compression Factor: Similar to the naive version, with one difference: when -ml and/or -mlr are given, insertions (e.g., lower letters) are considered to be their own column in this version.
But FastSP also outputs (in standard error):
- MaxLenNoGap: maximum number of non-gap characters
- NumSeq: Number of sequences
- LenRef: Length of reference alignment
- LenEst: Length of estimated alignment
- Cells: (LenEst+LenRef)*NumSeq
- Number of shared homologies
- Number of homologies in the reference alignment
- Number of homologies in the estimated alignment
- Number of correctly aligned columns
- Number of aligned columns in reference alignment
Make sure you capture and save standard error (using 2>somefilename if you are interested in these quantities).

Publication

FastSP: Linear time calculation of alignment accuracy by Siavash Mirarab and Tandy Warnow Bioinformatics 2011; doi: 10.1093/bioinformatics/btr553

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.externalToolBuilders		.externalToolBuilders
src/phylolab/alg/sp		src/phylolab/alg/sp
testfiles		testfiles
.classpath		.classpath
.project		.project
CHANGELOG		CHANGELOG
FastSP.jar		FastSP.jar
README.md		README.md
build.xml		build.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Usage

FAQ

Publication

About

Releases

Packages

Languages

smirarab/FastSP

Folders and files

Latest commit

History

Repository files navigation

Overview

Usage

FAQ

Publication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages