ms2rescore.rescoring_engines

Rescoring engines integrated in MS²Rescore.

Each integrated rescoring engine typically includes a rescore() function that takes a PSMList as input and writes the new scores, q-values, and PEPs to the original PSMList.

Mokapot

Mokapot integration for MS²Rescore.

mokapot is a full-Python implementation of the semi-supervised learning algorithms introduced with Percolator. It builds upon the flexible scikit-learn package, which makes it highly efficient for routine applications, but also customizable for experimental research settings. Using Mokapot through MS²Rescore brings several advantages over Percolator: It can be easily installed in the same Python environment, and it is generally faster as the communication between the tools happens completely within Python, without the need to write and read files or communicate through the command line. See mokapot.readthedocs.io for more information.

If you use Mokapot through MS²Rescore, please cite:

Fondrie W. E. & Noble W. S. mokapot: Fast and Flexible Semisupervised Learning for Peptide Detection. J Proteome Res (2021). doi:10.1021/acs.jproteome.0c01010

ms2rescore.rescoring_engines.mokapot.rescore(psm_list, output_file_root='ms2rescore', fasta_file=None, train_fdr=0.01, write_weights=False, write_txt=False, write_flashlfq=False, protein_kwargs=None, **kwargs)

Rescore PSMs with Mokapot.

The function provides a high-level interface to use Mokapot within MS²Rescore. It first converts the PSMList to a LinearPsmDataset, and then optionally adds protein information from a FASTA file. The dataset is then passed to the brew() function, which returns the new scores, q-values, and PEPs. These are then written back to the original PSMList. Optionally, results can be written to a Mokapot text file, a FlashLFQ-compatible file, or the model weights can be saved.

Parameters:
  • psm_list (PSMList) – PSMs to be rescored.

  • output_file_root (str) – Root of output file names. Defaults to "ms2rescore".

  • fasta_file (str | None) – Path to FASTA file with protein sequences to use for protein inference. Defaults to None.

  • train_fdr (float) – FDR to use for training the Mokapot model. Defaults to 0.01.

  • write_weights (bool) – Write model weights to a text file. Defaults to False.

  • write_txt (bool) – Write Mokapot results to a text file. Defaults to False.

  • write_flashlfq (bool) – Write Mokapot results to a FlashLFQ-compatible file. Defaults to False.

  • protein_kwargs (Dict[str, Any] | None) – Keyword arguments to pass to the add_proteins() method.

  • **kwargs (Any) – Additional keyword arguments are passed to the Mokapot brew() function.

Return type:

None

ms2rescore.rescoring_engines.mokapot.convert_psm_list(psm_list, feature_names=None)

Convert a PSM list to a Mokapot dataset.

Parameters:
  • psm_list (PSMList) – PSMList to rescore.

  • feature_names (List[str] | None) – List of feature names to use. Items must be keys in the PSM rescoring_features dict.

Return type:

LinearPsmDataset

ms2rescore.rescoring_engines.mokapot.save_model_weights(models, feature_names, output_file_root)

Save model weights to a file.

Parameters:
  • models (Tuple[Model]) – Tuple of Mokapot models (one for each fold) to save.

  • feature_names (List[str]) – List of feature names that were used to train the models.

  • output_file_root (str) – Root of output file names.

ms2rescore.rescoring_engines.mokapot.add_psm_confidence(psm_list, confidence_results)

Add PSM-level confidence estimates to PSM list, updating score, qvalue, pep, and rank.

Parameters:
  • psm_list (PSMList)

  • confidence_results (Confidence)

Return type:

None

ms2rescore.rescoring_engines.mokapot.add_peptide_confidence(psm_list, confidence_results)

Add Mokapot peptide-level confidence estimates to PSM list.

Parameters:
  • psm_list (PSMList)

  • confidence_results (Confidence)

Return type:

None

Percolator

Percolator integration for MS²Rescore

Percolator was the first tool to introduce semi-supervised learning for PSM rescoring. It is still widely used and has been integrated in many proteomics data analysis pipelines. This module integrates with Percolator through its command line interface. Percolator must be installed separately and the percolator command must be available in the PATH for this module to work. See github.com/percolator/percolator for more information.

If you use Percolator through MS²Rescore, please cite:

The M, MacCoss MJ, Noble WS, Käll L. Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0. J Am Soc Mass Spectrom (2016). doi:10.1007/s13361-016-1460-7

ms2rescore.rescoring_engines.percolator.rescore(psm_list, output_file_root='ms2rescore', log_level='info', processes=1, fasta_file=None, percolator_kwargs=None)

Rescore PSMs with Percolator.

Aside from updating the PSM score, qvalue, and pep values, the following output files are written:

  • Target PSMs: {output_file_root}.percolator.psms.pout

  • Target peptides: {output_file_root}.percolator.peptides.pout

  • Target proteins: {output_file_root}.percolator.proteins.pout

  • Decoy PSMs: {output_file_root}.percolator.decoy.psms.pout

  • Decoy peptides: {output_file_root}.percolator.decoy.peptides.pout

  • Decoy proteins: {output_file_root}.percolator.decoy.proteins.pout

  • Feature weights: {output_file_root}.percolator.weights.tsv

Percolator is run through its command line interface. Percolator must be installed separately and the percolator command must be available in the PATH for this module to work.

Parameters:
  • psm_list (PSMList) – PSMs to be rescored.

  • output_file_root (str) – Root of output file names. Defaults to ms2rescore.

  • log_level (str) – Log level for Percolator. Defaults to info.

  • processes (int) – Number of processes to use. Defaults to 1.

  • fasta_file (str | None) – Path to FASTA file for protein inference. Defaults to None.

  • percolator_kwargs (Dict[str, Any] | None) – Additional keyword arguments for Percolator. Defaults to None.

Return type:

None