-
Notifications
You must be signed in to change notification settings - Fork 0
Home
tbrunetti edited this page Jan 9, 2018
·
3 revisions
Below are some links that may be useful for quick start of already installed pipeline, troubleshooting pipeline problems and understanding the pipeline setup.
Argument | Usage | Type | Default | Explanation |
---|---|---|---|---|
-inputPLINK | required | str ending in .bed/.ped | NA | Full path to PLINK file ends in .bed or .ped, whitespace characters are not allowed |
-phenoFile | required | NA | NA | Full path to populated phenotype file: sample_sheet_template.xlsx |
--config | only required if .json not in .home of chunky | str | search in chunky .home directory | configuration file produced after running chunky config run_GWAS_analysis_pipeline.py
|
--outDir | optional | str | current working directory | Full path to an already existing directory or location where you would like GP3 to build the project |
--projectName | optional | str | year-month-date-hour-min-sec | Name of project to be created in the outDir location whitespace characters are not allowed |
--startStep | optional | str | hwe | str |
--endStep | optional | str | PCA_indi_graph (noTGP) or PCA_TGP_graph (if --TGP used) | options: if --TGP not set -> [hwe, LD, maf, het, ibd, PCA_indi, or PCA_indi_graph] if --TGP is set -> [hwe, LD, maf, het, ibd, outlier_removal, PCA_TGP, or PCA_TGP_graph]] Point of the pipeline where you would like to stop analysis. This step is inclusive! |
--hweThresh | optional | float | 1e-6 | Filters out SNPs that are smaller than this threshold due to liklihood of genotyping error |
--LDmethod | optional | str | indep | options:[indep, indep-pairwise or indep-pairphase] Method to calculate linkage disequilibrium. See PLINK documentation for more information. |
--VIF | optional | int | 2 | variant inflation factor for indep method LD pruning method only; indep-pairwise or indep-pairphase method will not use VIF |
--rsq | optional | float | 0.50 | any floating point number between 0.0-1.0; r-squared threshold for indep-pairwise or indep-pairphase LD pruning method; indep method will not use rsq |
--windowSize | optional | int | 50 | any integer; the window size in kb for LD analysis |
--stepSize | optional | int | 5 | any integer; variant count to shift window after each iteration |
--maf | optional | float | 0.05 | any floating point number between 0.0-1.0; filter remaining LD pruned variants by MAF, any MAF below set threshold is filtered out |
--hetMethod | optional | str | meanStd | options: minMax or meanStd; method to use to determine heterozygosity. minMax filter based on the parameters --hetThresh as the max F-inbreeding coefficient and --hetThreshMin for the minimum F-inbreeding coeffient, which by default are 0.10 and -0.10, respectively. The meanStd filter method calculates a het_score: 1-[observed[HOM]/total] and then filters out any samples that are more than 3 std deviations from the mean het_score. The number of standard deviations from the mean can be changed using the --het_std parameter |
--het_std | optional | int or float | 3 | any floating point number or integer; if using hetMethod=meanStd you can determine how many standard deviations aways from the mean is allowable for heterozygosity. Setting to 3 is interpreted as +/-3 standard deviations away from the mean of the het_score, calculated as 1-[observed(HOM)/total] |
--hetThresh | optional | float | 0.10 | any floating point number; filter out samples where inbreeding coefficient is greater than threshold (heterozygosity filtering); only used when method minMax for --hetMethod is selected |
--hetThreshMin | optional | float | -0.10 | any floating point number; filter out samples where inbreeding coefficient is samller than min threshold set (heterozygosity filtering); only used when method minMax for hetThresh is selected |
--sampleMiss | optional | float | 0.03 | any floating point number between 0.0-1.0; Maximum missingness of genotype call in sample before it should be filtered out. Where 0 is no missing, and 1 is all missing (0.03 is interpreted as 3 percent of snp calls are missing in a sample) |
--snpMiss | optional | float | 0.03 | any floating point number between 0.0-1.0; Maximum missingness of genotype call in a SNP cluster before the SNP should be filtered out. Where 0 is no missing, and 1 is all missing (0.03 is interpreted as 3 percent of sample calls are missing in a snp) |
--TGP | optional | flag | NA | specifying this flag means to generate PCA plots with TGP data merged into the given cohort data set for the 5 superpopulations in TGP (AFR, AMR, EAS, EUR, SAS) |
--centerPop | optional | str | myGroup | options: literally the string myGroup or available TGP group merged into input dataset; when using the TGP flag, you have the option to specify which population cohort that PCs should be centered around for boxplots. By default this is set to your group(s) listed in the sample sheet. You can pick a TGP super population listed in the TGP_Sub_and_SuperPopulation_info.txt file. CASE SENSITIVE! |
--outliers | optional | str | None | A txt file of FID and IID, tab-delimited and one sample per line, that are outliers that should be removed from the sample set (PCA outlier removal); Use original names (original FID and IID), not renamed 1-n for GENESIS formatting |
--pcmat | optional | int | 5 | any integer; Number of predicted admixture populations in dataset to be used in GENESIS calculation for PCA |
--reanalyze | optional | flag | NA | by adding this flag, it means you are going to pass a dataset through the pipeline that has already been partially/fully analyzed by this pipeline. WARNING! May over write exisiting data!! required if using --startStep argument OR if using --endStep arguments on an already existing project |