ms2rescore.feature_generators
Feature generators to add rescoring features to PSMs from various (re)sources and prediction tools.
- ms2rescore.feature_generators.FEATURE_GENERATORS: dict
Implemented feature generator classes by name.
ms2rescore.feature_generators.base
ms2rescore.feature_generators.basic
Generate basic features that can be extracted from any PSM list.
- class ms2rescore.feature_generators.basic.BasicFeatureGenerator(*args, **kwargs)
Bases:
FeatureGeneratorBase
Generate basic features that can be extracted from any PSM list, including search engine score, charge state, and MS1 error.
- Parameters:
*args – Positional arguments passed to the base class.
**kwargs – Keyword arguments passed to the base class.
ms2rescore.feature_generators.deeplc
DeepLC retention time-based feature generator.
DeepLC is a fully modification-aware peptide retention time predictor. It uses a deep convolutional neural network to predict retention times based on the atomic composition of the (modified) amino acid residues in the peptide. See github.com/compomics/deeplc for more information.
If you use DeepLC through MS²Rescore, please cite:
Bouwmeester, R., Gabriels, R., Hulstaert, N. et al. DeepLC can predict retention times for peptides that carry unknown modifications. Nat Methods 18, 1363-1369 (2021). doi:10.1038/s41592-021-01301-5
- class ms2rescore.feature_generators.deeplc.DeepLCFeatureGenerator(*args, lower_score_is_better=False, calibration_set_size=None, processes=1, **kwargs)
Bases:
FeatureGeneratorBase
Generate DeepLC-based features for rescoring.
DeepLC retraining is on by default. Add
deeplc_retrain: False
as a keyword argument to disable retraining.- Parameters:
lower_score_is_better (bool) – Whether a lower PSM score denotes a better matching PSM. Default: False
calibration_set_size (int or float) – Amount of best PSMs to use for DeepLC calibration. If this value is lower than the number of available PSMs, all PSMs will be used. (default: 0.15)
processes ({int, None}) – Number of processes to use in DeepLC. Defaults to 1.
kwargs (dict) – Additional keyword arguments are passed to DeepLC.
ms2rescore.feature_generators.ionmob
ionmob
collisional cross section (CCS)-based feature generator.
ionmob
is a predictor for peptide collisional cross sections (CCS), as measured in ion mobility
devices, such as the Bruker timsTOF instruments. More info can be found on the
ionmob GitHub page.
If you use ionmob
in your work, please cite the following publication:
Teschner, D. et al. Ionmob: a Python package for prediction of peptide collisional cross-section values. Bioinformatics 39, btad486 (2023). doi:10.1093/bioinformatics/btad486
- class ms2rescore.feature_generators.ionmob.IonMobFeatureGenerator(*args, ionmob_model='GRUPredictor', reference_dataset=None, tokenizer=None, **kwargs)
Bases:
FeatureGeneratorBase
Ionmob collisional cross section (CCS)-based feature generator.
- Parameters:
*args – Additional arguments passed to the base class.
ionmob_model (str) – Path to a trained Ionmob model or one of the default models (
DeepTwoMerModel
,GRUPredictor
, orSqrtModel
). Default:GRUPredictor
.reference_dataset (str | None) – Path to a reference dataset for CCS shift calculation. Uses the default reference dataset if not specified.
tokenizer (str | None) – Path to a tokenizer or one of the default tokenizers. Uses the default tokenizer if not specified.
**kwargs – Additional keyword arguments passed to the base class.
- property allowed_modifications
Return a list of modifications that are allowed in ionmob.
- add_features(psm_list)
Add Ionmob-derived features to PSMs.
- Parameters:
psm_list (PSMList) – PSMs to add features to.
- Return type:
None
- static tokenize_peptidoform(peptidoform)
Tokenize proforma sequence and add modifications.
- Parameters:
peptidoform (Peptidoform)
- Return type:
- calculate_ccs_shift(psm_dataframe)
Apply CCS shift to CCS values.
- Parameters:
psm_dataframe (DataFrame) – Dataframe with PSMs as returned by
psm_utils.PSMList.to_dataframe()
.- Return type:
- exception ms2rescore.feature_generators.ionmob.IonmobException
Bases:
FeatureGeneratorException
Exception raised by Ionmob feature generator.
ms2rescore.feature_generators.maxquant
Feature generator for PSMs from the MaxQuant search engine.
MaxQuant msms.txt files contain various metrics from peptide-spectrum matching that can be used to generate rescoring features. These include features related to the mass errors of the seven fragment ions with the highest intensities, and features related to the ion current of the identified fragment ions.
- class ms2rescore.feature_generators.maxquant.MaxQuantFeatureGenerator(*args, **kwargs)
Bases:
FeatureGeneratorBase
Generate MaxQuant-derived features.
- Raises:
MissingMetadataError – If the required metadata entries are not present in the PSMs.
- exception ms2rescore.feature_generators.maxquant.MissingMetadataError
Bases:
MS2RescoreError
Exception raised when a required metadata entry is missing.
ms2rescore.feature_generators.ms2pip
MS²PIP fragmentation intensity-based feature generator.
MS²PIP is a machine learning tool that predicts the MS2 spectrum of a peptide given its sequence. It is previously identified MS2 spectra and their corresponding peptide sequences. Because MS²PIP uses the highly performant - but traditional - machine learning approach XGBoost, it can already produce accurate predictions even if trained on smaller spectral libraries. This makes MS²PIP a very flexible platform to train new models on custom datasets. Nevertheless, MS²PIP comes with several pre-trained models. See github.com/compomics/ms2pip for more information.
Because traditional proteomics search engines do not fully consider MS2 peak intensities in their scoring functions, adding rescoring features derived from spectrum prediction tools has proved to be a very effective way to further improve the sensitivity of peptide-spectrum matching.
If you use MS²PIP through MS²Rescore, please cite:
Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, A., Carapito, C., Martens, L., Degroeve, S., Gabriels, R. Updated MS²PIP web server supports cutting-edge proteomics applications. Nucleic Acids Research (2023) doi:10.1093/nar/gkad335
- class ms2rescore.feature_generators.ms2pip.MS2PIPFeatureGenerator(*args, model='HCD', ms2_tolerance=0.02, spectrum_path=None, spectrum_id_pattern='(.*)', model_dir=None, processes, **kwargs)
Bases:
FeatureGeneratorBase
Generate MS²PIP-based features.
- Parameters:
model (str) – MS²PIP prediction model to use. Defaults to
HCD
.ms2_tolerance (float) – MS2 mass tolerance in Da. Defaults to
0.02
.spectrum_path (str | None) – Path to spectrum file or directory with spectrum files. If None, inferred from
run
field in PSMs. Defaults toNone
.spectrum_id_pattern (str, optional) – Regular expression pattern to extract spectrum ID from spectrum file. Defaults to
*
.model_dir (str | None) – Directory containing MS²PIP models. Defaults to
None
(use MS²PIP default).processes (int, optional) – Number of processes to use. Defaults to 1.