ms2rescore.feature_generators

Feature generators to add rescoring features to PSMs from various (re)sources and prediction tools.

ms2rescore.feature_generators.FEATURE_GENERATORS: dict

Implemented feature generator classes by name.

ms2rescore.feature_generators.base

class ms2rescore.feature_generators.base.FeatureGeneratorBase(*args, **kwargs)

Bases: ABC

Base class from which all feature generators must inherit.

exception ms2rescore.feature_generators.base.FeatureGeneratorException

Bases: Exception

Base class for exceptions raised by feature generators.

ms2rescore.feature_generators.basic

Generate basic features that can be extracted from any PSM list.

class ms2rescore.feature_generators.basic.BasicFeatureGenerator(*args, **kwargs)

Bases: FeatureGeneratorBase

Generate basic features that can be extracted from any PSM list, including search engine score, charge state, and MS1 error.

Parameters:
  • *args – Positional arguments passed to the base class.

  • **kwargs – Keyword arguments passed to the base class.

feature_names

Names of the features that will be added to the PSMs.

Type:

list[str]

add_features(psm_list)

Add basic features to a PSM list.

Parameters:

psm_list (PSMList) – PSM list to add features to.

Return type:

None

ms2rescore.feature_generators.deeplc

DeepLC retention time-based feature generator.

DeepLC is a fully modification-aware peptide retention time predictor. It uses a deep convolutional neural network to predict retention times based on the atomic composition of the (modified) amino acid residues in the peptide. See github.com/compomics/deeplc for more information.

If you use DeepLC through MS²Rescore, please cite:

Bouwmeester, R., Gabriels, R., Hulstaert, N. et al. DeepLC can predict retention times for peptides that carry unknown modifications. Nat Methods 18, 1363-1369 (2021). doi:10.1038/s41592-021-01301-5

class ms2rescore.feature_generators.deeplc.DeepLCFeatureGenerator(*args, lower_score_is_better=False, calibration_set_size=None, processes=1, **kwargs)

Bases: FeatureGeneratorBase

Generate DeepLC-based features for rescoring.

DeepLC retraining is on by default. Add deeplc_retrain: False as a keyword argument to disable retraining.

Parameters:
  • lower_score_is_better (bool) – Whether a lower PSM score denotes a better matching PSM. Default: False

  • calibration_set_size (int or float) – Amount of best PSMs to use for DeepLC calibration. If this value is lower than the number of available PSMs, all PSMs will be used. (default: 0.15)

  • processes ({int, None}) – Number of processes to use in DeepLC. Defaults to 1.

  • kwargs (dict) – Additional keyword arguments are passed to DeepLC.

feature_names

Names of the features that will be added to the PSMs.

Type:

list[str]

add_features(psm_list)

Add DeepLC-derived features to PSMs.

Parameters:

psm_list (PSMList)

Return type:

None

ms2rescore.feature_generators.ionmob

ionmob collisional cross section (CCS)-based feature generator.

ionmob is a predictor for peptide collisional cross sections (CCS), as measured in ion mobility devices, such as the Bruker timsTOF instruments. More info can be found on the ionmob GitHub page.

If you use ionmob in your work, please cite the following publication:

Teschner, D. et al. Ionmob: a Python package for prediction of peptide collisional cross-section values. Bioinformatics 39, btad486 (2023). doi:10.1093/bioinformatics/btad486

class ms2rescore.feature_generators.ionmob.IonMobFeatureGenerator(*args, ionmob_model='GRUPredictor', reference_dataset=None, tokenizer=None, **kwargs)

Bases: FeatureGeneratorBase

Ionmob collisional cross section (CCS)-based feature generator.

Parameters:
  • *args – Additional arguments passed to the base class.

  • ionmob_model (str) – Path to a trained Ionmob model or one of the default models (DeepTwoMerModel, GRUPredictor, or SqrtModel). Default: GRUPredictor.

  • reference_dataset (str | None) – Path to a reference dataset for CCS shift calculation. Uses the default reference dataset if not specified.

  • tokenizer (str | None) – Path to a tokenizer or one of the default tokenizers. Uses the default tokenizer if not specified.

  • **kwargs – Additional keyword arguments passed to the base class.

property allowed_modifications

Return a list of modifications that are allowed in ionmob.

add_features(psm_list)

Add Ionmob-derived features to PSMs.

Parameters:

psm_list (PSMList) – PSMs to add features to.

Return type:

None

static tokenize_peptidoform(peptidoform)

Tokenize proforma sequence and add modifications.

Parameters:

peptidoform (Peptidoform)

Return type:

list

calculate_ccs_shift(psm_dataframe)

Apply CCS shift to CCS values.

Parameters:

psm_dataframe (DataFrame) – Dataframe with PSMs as returned by psm_utils.PSMList.to_dataframe().

Return type:

float

exception ms2rescore.feature_generators.ionmob.IonmobException

Bases: FeatureGeneratorException

Exception raised by Ionmob feature generator.

ms2rescore.feature_generators.maxquant

Feature generator for PSMs from the MaxQuant search engine.

MaxQuant msms.txt files contain various metrics from peptide-spectrum matching that can be used to generate rescoring features. These include features related to the mass errors of the seven fragment ions with the highest intensities, and features related to the ion current of the identified fragment ions.

class ms2rescore.feature_generators.maxquant.MaxQuantFeatureGenerator(*args, **kwargs)

Bases: FeatureGeneratorBase

Generate MaxQuant-derived features.

feature_names

Names of the features that will be added to the PSMs.

Type:

list[str]

Raises:

MissingMetadataError – If the required metadata entries are not present in the PSMs.

add_features(psm_list)

Add MS²PIP-derived features to PSMs.

Parameters:

psm_list (PSMList) – PSMs to add features to.

exception ms2rescore.feature_generators.maxquant.MissingMetadataError

Bases: MS2RescoreError

Exception raised when a required metadata entry is missing.

ms2rescore.feature_generators.ms2pip

MS²PIP fragmentation intensity-based feature generator.

MS²PIP is a machine learning tool that predicts the MS2 spectrum of a peptide given its sequence. It is previously identified MS2 spectra and their corresponding peptide sequences. Because MS²PIP uses the highly performant - but traditional - machine learning approach XGBoost, it can already produce accurate predictions even if trained on smaller spectral libraries. This makes MS²PIP a very flexible platform to train new models on custom datasets. Nevertheless, MS²PIP comes with several pre-trained models. See github.com/compomics/ms2pip for more information.

Because traditional proteomics search engines do not fully consider MS2 peak intensities in their scoring functions, adding rescoring features derived from spectrum prediction tools has proved to be a very effective way to further improve the sensitivity of peptide-spectrum matching.

If you use MS²PIP through MS²Rescore, please cite:

Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, A., Carapito, C., Martens, L., Degroeve, S., Gabriels, R. Updated MS²PIP web server supports cutting-edge proteomics applications. Nucleic Acids Research (2023) doi:10.1093/nar/gkad335

class ms2rescore.feature_generators.ms2pip.MS2PIPFeatureGenerator(*args, model='HCD', ms2_tolerance=0.02, spectrum_path=None, spectrum_id_pattern='(.*)', model_dir=None, processes, **kwargs)

Bases: FeatureGeneratorBase

Generate MS²PIP-based features.

Parameters:
  • model (str) – MS²PIP prediction model to use. Defaults to HCD.

  • ms2_tolerance (float) – MS2 mass tolerance in Da. Defaults to 0.02.

  • spectrum_path (str | None) – Path to spectrum file or directory with spectrum files. If None, inferred from run field in PSMs. Defaults to None.

  • spectrum_id_pattern (str, optional) – Regular expression pattern to extract spectrum ID from spectrum file. Defaults to *.

  • model_dir (str | None) – Directory containing MS²PIP models. Defaults to None (use MS²PIP default).

  • processes (int, optional) – Number of processes to use. Defaults to 1.

feature_names

Names of the features that will be added to the PSMs.

Type:

list[str]

add_features(psm_list)

Add MS²PIP-derived features to PSMs.

Parameters:

psm_list (PSMList) – PSMs to add features to.

Return type:

None