Pyteomics documentation v5.0

The ProForma standard and implementation

Contents

The ProForma standard and implementation

ProForma is a standard for representing proteoforms and peptidoforms, developed by the PSI (Proteomics Standards Initiative). It provides a structured way to represent peptide sequences a wide variety of modifications and uncertainties.

Pyteomics supports ProForma v2.0. The core functions and classes related to ProForma support are located in the proforma - Proteoform and Peptidoform Notation, see there for more information.

Basic usage

The ProForma parser is object-oriented, with a primary class ProForma representing a parsed ProForma sequence. To instantiate a ProForma object, use the class method ProForma.parse():

.. code-block:: python

    >>> seq = ProForma.parse("EM[Oxidation]EVT[Phospho]SES[Phospho]PEK")
    >>> seq
    ProForma([('E', None), ('M', [GenericModification('Oxidation', None, None)]), ('E', None), ('V', None), ('T', [GenericModification('Phospho', None, None)]), ('S', None), ('E', None), ('S', [GenericModification('Phospho', None, None)]), ('P', None), ('E', None), ('K', None)], {'n_term': None, 'c_term': None, 'unlocalized_modifications': [], 'labile_modifications': [], 'fixed_modifications': [], 'intervals': [], 'isotopes': [], 'group_ids': [], 'charge_state': None})

    >>> seq.mass
    1440.47687500136

    >>> seq.composition()
    Composition({'H': 86, 'C': 51, 'O': 30, 'N': 12, 'S': 1, 'P': 2})

Chimeric spectra

Top-level + in ProForma is treated as a chimeric separator only when chimeric=True is passed. The return value is then a list of parsed components:

>>> forms = ProForma.parse("<[Carbamidomethyl]@C>AC+CC", chimeric=True)
>>> len(forms)
2
>>> [str(form) for form in forms]
['<[Carbamidomethyl]@C>AC', '<[Carbamidomethyl]@C>CC']

Fixed modification rules, isotope labels, and peptidoform names are shared across all chimeric components.

Other APIs such as mass calculation, fragment series generation, and spectrum annotation operate on one peptidoform at a time. Use the parsed components individually:

>>> from pyteomics import mass
>>> masses = [mass.calculate_mass(proforma=str(form)) for form in forms]
>>> fragments = [mass.fragment_series(str(form)) for form in forms]

Contents