Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help needed for FDP calculation #4

Open
Francis-B opened this issue Oct 30, 2024 · 12 comments
Open

Help needed for FDP calculation #4

Francis-B opened this issue Oct 30, 2024 · 12 comments

Comments

@Francis-B
Copy link

Francis-B commented Oct 30, 2024

Hi,

In order to get familiar with your tool, I tried to reproduce the TIDE+Percolator-RESET plot in the figure 3 of your article. However, the 3 methods returned me a FDP greatly below the FDR threshold (see figure below). I also tried FDRbench with one of my dataset and a Comet+PeptideProphet pipeline and, once again, the FDPs I got were greatly below the FDR threshold. Moreover, the FDP was a bit lower with a proteogenomic database than with SwissProt, which is unexpected.

Since I got similar results with different search engines and post-processing tools, I guess that my problem arise from the arguments I use to run FDRbench.

Here are my command lines,

to generate entrapment database:

java -jar fdrbench-0.0.1.jar  -enzyme 1 -fix_nc c  -level peptide  -db <path/to/database.fasta>  -o <database_entrapment.tx -uniprot -minLength 7  -maxLength 35

to compute FDP:

java -jar fdrbench-0.0.1.jar -fold 1 -pep <database_entrapment.txt> -i <percolator-RESET_output> -o <output_path> -score "TailorScore:1"

I used the same version of SwissProt as you did (UP000005640) and all the parameters you mentioned in the methods section for Tide (crux 4.2.Linux) and Percolator-RESET (v. 0.0.6).

I did not aggregated the FDPs of all runs of the PXD001468 dataset as you did in the article, but each run yields me a figure similar to the following:

image

Would you have any idea of what I could have done wrong? If no, could you provide me the the arguments you used to run FDRbench for the figure 3?

If you would like to have more details, I will be happy to provide them!

Thanks a lot!

@wenbostar
Copy link
Contributor

wenbostar commented Oct 30, 2024

java -jar fdrbench-0.0.1.jar -fold 1 -pep <database_entrapment.txt> -i <percolator-RESET_output> -o <output_path> -score "TailorScore:1"

Hi @Francis-B , could you please share the input files that you ran the command line for the plot you showed with me?

Bo

@Francis-B
Copy link
Author

Sure, here are OneDrive links to these files:

database_entrapment.txt
percolator-RESET_output

The percolator-RESET_output is in fact a filtered version of percolator output to have only the relevant columns for FDRbench!

@wenbostar
Copy link
Contributor

Thanks for sharing the data. I got the same result when I ran FDRBench on your input data. I think the issue is that you did a combined search (multiple DDA MS/MS files from different MS runs) but you evaluated the FDR control in each individual runs.

Could you just run Tide (crux 4.2.Linux) with Percolator-RESET (v. 0.0.6) on the individual run to see how the plot looks like?

@Francis-B
Copy link
Author

Thanks for your quick answer!

The input files I attached above were obtained from a search I did only on the b1906_293T_proteinID_01A_QE3_122212.mzML file. In fact, I wrote a snakemake pipeline to repeat the analysis for each file of the dataset individually.

To make sure there were no problems in my pipeline that could cause a combined search or mess up any other step, I reran all steps on the b1906(...) file alone. The new plot I got was not exactly the same, but it was really similar. I suppose this small difference is due to the stochasticity of the entrapment database creation (I did not mention a random seed).

If it can help you, I just uploaded my pipeline here so you can see each step I did and I can send you OneDrive link for intermediate files you would like to have to not run all the pipeline.

@wenbostar
Copy link
Contributor

I did a quick search using Tide+Percolator on b1906_293T_proteinID_01A_QE3_122212.mgf. Below is what I got:

I will look into your workflow later.

Bo

@Francis-B
Copy link
Author

Cool, thanks again!

In the mean time, I'll triple check the params of my steps to see if I can get I plot similar to yours!

@wenbostar
Copy link
Contributor

Hi @Francis-B , could you please share the inputs and outputs for each step in your workflow with me when you ran it on b1906_293T_proteinID_01A_QE3_122212.mzML?

@Francis-B
Copy link
Author

Sure, here they are!

All files are organised in the same subfolders as mentionned in the snakefile.

@wenbostar
Copy link
Contributor

The issue is that you need to do a concatenated search in Tide (--concat T) when using percolator-reset.

@Francis-B
Copy link
Author

I confirm that this solved my problem! Thank you so much!

Francis

@wenbostar
Copy link
Contributor

Hi Francis,

Great. Thanks for the confirmation.

What does it look like in the proteogenomics search? I haven't tried FDRBench on this type of search.

Bo

@Francis-B
Copy link
Author

Hi Bo,

Sorry for the delay of my answer, but I'm in rush right now with upcoming deadlines, so I had to put this side project on hold. But I'll be more than happy to share with you the FDP estimations with proteogenomics search when I get them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants