Skip to content

ZhaoS-Lab/SEG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SEG

The program SEG analyzes the log2-ratios from whole genome sequencing or microarray (CGH or SNP array) data. SEG first identifies change-points among the log2-ratio series along a chromosome and then identifies segments that are significantly amplified or deleted. SEG was developed by Dr. Mucheng Zhang, Dr. Deli Liu, Dr. Jie Tang and others in Dr. Shaying Zhao's lab. Please contact [email protected] for any issues regarding SEG

SEG is published at https://doi.org/10.1016/j.csbj.2018.09.001

SEG USAGE SEG analysis input data in two steps. Step 1 in only needed for first-time run.

STEP 1: Calculate standard deviations and the mean log2-ratio of aCGH data, chromsome by chromosome, as well as across the whole genome.

USAGE: ./stat input_step1 output_step1 [#_of_chromosome]
Option: number of chromosomes (default = 24)

Input_step1 Format:
    Title: name of each column
    data lines: chromosome_name probe_position log2-ratio
Output_step1 Format:
    Title: name of each column
    data lines: chromosome_name num_of_probes chromosome_std

Note: The delimiter should be Space or Tab.
The input file must be sorted by chromosome_name and probe_position (in increasing order).

STEP 2: Smooth data and Do segmention to find CNVs (Copy-Number Variations) USAGE: ./seg input_step1 output_step1 output_step2 [-q fdr] [-m min-mean] [-s min-size] [-w init-size][-d deviation][-k #DP-segment][-n 1] Input_step1 : the same file as STEP1 output_step1: the output from STEP1 Parameters:

	-q fdr: set desired FDR control level in Benjamini-Hochberg procedure (default value=0.005) 
	-m min-mean: minimum mean log2-ratios of segments to be labeled as significant (1 or-1 in output_step2 column 6,default=genomic_std from step 1)
	-s min-size: minimum number of probes of segments to be labeled as significant. default=5 (probes). 
	-w init-size: initial probe # in pre-assigned segments to detect change-points. default =6 (probes).
	-d deviation: std value used to lable segments. (default=genomic-std).
	-n 1: Do a linear transformation for input-data, so that the mean log2-ratio of whole data is 0
	-k Km: number of pre-assigned segments in a window for DP. SEG will detect Km-1 change-points in successive Km*W probes. (default=101)
        
*NOTE: Computation-INTENSIVE parameter. The Higer, the better. If the probe number(M) in the largest chromosome is not huge (<20k), it is strongly recommended 
that set the value of Km and w (with options -k and -w) such that Km*W>=M. Then DP will be applied to the whole chromosome, which is most
accurate/sensitive.

OutPut Format:
   Column 1-5: chr_name start_index end_index start_position end_position
   Column 6-10: label mean*sqrt(np) mean np(number of probe) p-value

Alternative of STEP2: Step2 may be run in two stages: DP process change-points detection and segments labeling. If large k (>= 1000) is applied and users want to test different sets of parameters without doing computation-intensive DP process each time, this alternative is recommended.

STEP 2A: use DP to detect (candicate) change-points. Usage: ./segA input_step1 output_step1 output_step2A [-w init-size][-k DP_segments][-n 1]

output format:
    chr-name start-position log2-ratio label
	
Where labels(0 or 1) indicate positions of detected change-points.Other parameter are same as STEP2.

STEP 2B: Label the segments with customized parameters. Usage: ./segB input_step1 output_step1 output_step2A output_step2B [-q fdr] [-m min-mean] [-s min-size][-d deviation]

output_step2A is output of STEP 2A. Other parameters and options are same as STEP2.

----------------- Tips ----------------------------------------------------

If User want to compare the results across a group of samples,then (1) the option -d pooled-deviation is recommended. The pooled standard deviation will be used in labeling every sample within the group.

(2) if Users want to use the same cut-off value of segment p-values for labeling(not recommended), 
then use the option
    	-f 0 -q Q0        
to skip B-H procedure and use the option and specify Q0 as the cut-off value.

---------------------- EXAMPLEs --------------------------------------------------

using "chr22-input-data" in the demo-data folder

step 1: ./SEG_executables/stat ./demo-data/chr22-input-data chr22-input-data_out 1

step 2: ./SEG_executables/seg ./demo-data/chr22-input-data chr22-input-data_out chr22-input-data_out_seq -q 0.1 -m 0.525018 -s 5 -w 6 -k 101 -n 1

using "K9-aCGH_data" in the demo-data folder

Dog has 39 chromosomes Here is format of inputfile CHROMOSOME CHR_POSITION LOG2_RATIO chr1 3212001 0.1090 chr1 3225657 -0.1221 chr1 3238689 -0.1199 chr1 3243589 -0.0846 chr1 3248064 -0.1770

step 1: ./SEG_executables/stat ./demo-data/K9-aCGH_data K9-aCGH_data.stat 39

step2: ./SEG_executables/seg ./demo-data/K9-aCGH_data K9-aCGH_data.stat K9-aCGH_data_segOutput -k 4000

chromsome 1, the largest chromosome in dog, has 20820 probes. when Km=4000, Kmw=40006=24000>20820, the DP will be applied on whole chromosomes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published