Using the Python API

This tutorial shows how to use the MS²Rescore Python API for each step of the rescoring process individually. This is useful if you want to customize rescoring for your own Python workflow or if you want to understand how MS²Rescore works.

Note that the full MS²Rescore workflow is also available from Python with the single function call ms2rescore.rescore().

[1]:
import logging
import plotly.io

logging.basicConfig(level=logging.INFO)
plotly.io.renderers.default = "plotly_mimetype+notebook"

Reading and parsing peptide-spectrum matches

[2]:
from psm_utils.io import read_file

from ms2rescore.report.charts import score_histogram
INFO:numexpr.utils:Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.

Reading the PSM file

MS²Rescore is fully centered around the use of a psm_utils PSMList. This is a unified data representation of PSMs and their various attributes. Internally, it is simply a list of Pydantic data classes which represent PSMs. With the submodule psm_utils.io, we can read PSMs from a variety of file formats. Here, we will read a PSM file in the MaxQuant msms.txt format.

Importantly, for rescoring, the PSM file must contain all target and decoy PSMs, including PSMs that did not pass the FDR threshold. Most search engines must be specifically configured to return all PSMs without FDR filtering.

[3]:
psm_list = read_file("../../../examples/id/msms.txt", filetype="msms")
psm_list["spectrum_id"] = [str(spec_id) for spec_id in psm_list["spectrum_id"]]

For a quick inspection, we can format the PSM list as a Pandas dataframe and display the first few rows:

[4]:
psm_list.to_dataframe().head()
[4]:
peptidoform spectrum_id run collection spectrum is_decoy score qvalue pep precursor_mz retention_time ion_mobility protein_list rank source provenance_data metadata rescoring_features
0 AAAAAAALQAK/2 4703 20161213_NGHF_DBJ_SA_Exp3A_HeLa_1ug_7min_15000_02 None None False 107.660 None 0.001517 478.77982 5.2007 None [P36578, H3BM89, H3BU31] None msms {'msms_filename': '..\..\..\examples\id\msms.t... {'Scan index': '3698', 'Sequence': 'AAAAAAALQA... {}
1 [ac]-AAAAAEQQQFYLLLGNLLSPDNVVR/3 13572 20161213_NGHF_DBJ_SA_Exp3A_HeLa_1ug_7min_15000_02 None None False 107.740 None 0.004931 915.15197 11.8470 None [O00410, E7ETV3, E7EQT5, C9JZD8] None msms {'msms_filename': '..\..\..\examples\id\msms.t... {'Scan index': '11885', 'Sequence': 'AAAAAEQQQ... {}
2 [ac]-AAAAAEQQQFYLLLGNLLSPDNVVRK/3 13366 20161213_NGHF_DBJ_SA_Exp3A_HeLa_1ug_7min_15000_02 None None False 137.890 None 0.000493 957.85029 11.6900 None [O00410, E7ETV3, E7EQT5, C9JZD8] None msms {'msms_filename': '..\..\..\examples\id\msms.t... {'Scan index': '11695', 'Sequence': 'AAAAAEQQQ... {}
3 AAAAAQGGGGGEPR/2 505 20161213_NGHF_DBJ_SA_Exp3A_HeLa_1ug_7min_15000_02 None None False 22.641 None 0.142020 585.28653 0.5178 None [E9PJF0, E9PQW4, P27361] None msms {'msms_filename': '..\..\..\examples\id\msms.t... {'Scan index': '419', 'Sequence': 'AAAAAQGGGGG... {}
4 AAAAAWEEPSSGN[de]GTAR/2 6589 20161213_NGHF_DBJ_SA_Exp3A_HeLa_1ug_7min_15000_02 None None False 89.403 None 0.046504 823.87389 6.6105 None [Q9P258] None msms {'msms_filename': '..\..\..\examples\id\msms.t... {'Scan index': '5439', 'Sequence': 'AAAAAWEEPS... {}

We can also directly plot the current PSM score distributions:

[5]:
score_histogram(psm_list)