Pyteomics documentation v5.0rc2

proforma - Proteoform and Peptidoform Notation

«  usi - Universal Spectrum Identifier (USI) parser and minimal PROXI client   ::   Contents   ::   featurexml - reader for featureXML files  »

proforma - Proteoform and Peptidoform Notation

ProForma is a notation for defining modified amino acid sequences using a set of controlled vocabularies, as well as encoding uncertain or partial information about localization. See ProForma specification for more up-to-date information.

Strictly speaking, this implementation supports ProForma v2.

Data Access

parse() - The primary interface for parsing ProForma strings.

>>> parse("EM[Oxidation]EVT[#g1(0.01)]S[#g1(0.09)]ES[Phospho#g1(0.90)]PEK")
    ([('E', None),
      ('M', [GenericModification('Oxidation', None, None)]),
      ('E', None),
      ('V', None),
      ('T', [LocalizationMarker(0.01, None, '#g1')]),
      ('S', [LocalizationMarker(0.09, None, '#g1')]),
      ('E', None),
      ('S',
      [GenericModification('Phospho', [LocalizationMarker(0.9, None, '#g1')], '#g1')]),
      ('P', None),
      ('E', None),
      ('K', None)],
     {'n_term': None,
      'c_term': None,
      'unlocalized_modifications': [],
      'labile_modifications': [],
      'fixed_modifications': [],
      'intervals': [],
      'isotopes': [],
      'group_ids': ['#g1']})

to_proforma() - Format a sequence and set of properties as ProForma text.

Classes

ProForma - An object oriented version of the parsing and formatting code, coupled with minimal information about mass and position data.

>>> seq = ProForma.parse("EM[Oxidation]EVT[#g1(0.01)]S[#g1(0.09)]ES[Phospho#g1(0.90)]PEK")
>>> seq
ProForma([('E', None), ('M', [GenericModification('Oxidation', None, None)]), ('E', None),
          ('V', None), ('T', [LocalizationMarker(0.01, None, '#g1')]), ('S', [LocalizationMarker(0.09, None, '#g1')]),
          ('E', None), ('S', [GenericModification('Phospho', [LocalizationMarker(0.9, None, '#g1')], '#g1')]),
          ('P', None), ('E', None), ('K', None)],
          {'n_term': None, 'c_term': None, 'unlocalized_modifications': [],
           'labile_modifications': [], 'fixed_modifications': [], 'intervals': [],
           'isotopes': [], 'group_ids': ['#g1'], 'charge_state': None}
        )
>>> seq.mass
1360.51054400136
>>> seq.tags
[GenericModification('Oxidation', None, None),
 LocalizationMarker(0.01, None, '#g1'),
 LocalizationMarker(0.09, None, '#g1'),
 GenericModification('Phospho', [LocalizationMarker(0.9, None, '#g1')], '#g1')]
>>> str(seq)
'EM[Oxidation]EVT[#g1(0.01)]S[#g1(0.09)]ES[Phospho|#g1(0.9)]PEK'

Dependencies

To resolve PSI-MOD, XL-MOD, and GNO identifiers, psims is required. By default, psims retrieves the most recent version of each controlled vocabulary from the internet, but includes a fall-back version to use when the network is unavailable. It can also create an application cache on disk.

CV Disk Caching

ProForma uses several different controlled vocabularies (CVs) that are each versioned separately. Internally, the Unimod controlled vocabulary is accessed using Unimod and all other controlled vocabularies are accessed using psims. Unless otherwise stated, the machinery will download fresh copies of each CV when first queried.

To avoid this slow operation, you can keep a cached copy of the CV source file on disk and tell pyteomics and psims where to find them:

from pyteomics import proforma

# set the path for Unimod loading via pyteomics
proforma.set_unimod_path("path/to/unimod.xml")

# set the cache directory for downloading and reloading OBOs via psims
proforma.obo_cache.cache_path = "obo/cache/dir/"
proforma.obo_cache.enabled = True

Compliance Levels

1. Base Level Support Represents the lowest level of compliance, this level involves providing support for:

  • [x] Amino acid sequences

  • [x] Protein modifications using two of the supported CVs/ontologies: Unimod and PSI-MOD.

  • [x] Protein modifications using delta masses (without prefixes)

  • [x] N-terminal, C-terminal and labile modifications.

  • [x] Ambiguity in the modification position, including support for localisation scores.

  • [x] INFO tag.

2. Additional Separate Support These features are independent from each other:

  • [x] Unusual amino acids (O and U).

  • [x] Ambiguous amino acids (e.g. X, B, Z). This would include support for sequence tags of known mass (using the character X).

  • [x] Protein modifications using delta masses (using prefixes for the different CVs/ontologies).

  • [x] Use of prefixes for Unimod (U:) and PSI-MOD (M:) names.

  • [x] Support for the joint representation of experimental data and its interpretation.

  1. Top Down Extensions

    • [ ] Additional CV/ontologies for protein modifications: RESID (the prefix R MUST be used for RESID CV/ontology term names)

    • [x] Chemical formulas (this feature occurs in two places in this list).

  2. Cross-Linking Extensions

    • [ ] Cross-linked peptides (using the XL-MOD CV/ontology, the prefix X MUST be used for XL-MOD CV/ontology term names).

  3. Glycan Extensions

    • [x] Additional CV/ontologies for protein modifications: GNO (the prefix G MUST be used for GNO CV/ontology term names)

    • [x] Glycan composition.

    • [x] Chemical formulas (this feature occurs in two places in this list).

  4. Spectral Support

    • [x] Charge state and adducts

    • [ ] Chimeric spectra are special cases.

    • [x] Global modifications (e.g., every C is C13).

Functions

pyteomics.proforma.parse(sequence: str, **kwargs) Tuple[List[Tuple[str, List[TagBase] | None]], Dict[str, Any]][source]

Tokenize a ProForma sequence into a sequence of amino acid+tag positions, and a mapping of sequence-spanning modifiers.

Note

This is a state machine parser, but with certain sub-state paths unrolled to avoid an explosion of formal intermediary states.

Parameters:
  • sequence (str) – The sequence to parse

  • **kwargs – Forwarded to Parser

Returns:

  • parsed_sequence (list[tuple[str, list[TagBase]]]) – The (amino acid: str, TagBase or None) pairs denoting the positions along the primary sequence

  • modifiers (dict) – A mapping listing the labile modifications, fixed modifications, stable isotopes, unlocalized modifications, tagged intervals, and group IDs

pyteomics.proforma.to_proforma(sequence, n_term: List[TagBase] | None = None, c_term: List[TagBase] | None = None, unlocalized_modifications: List[TagBase] | None = None, labile_modifications: List[TagBase] | None = None, fixed_modifications: List[TagBase] | None = None, intervals: List[TaggedInterval] | None = None, isotopes: List[StableIsotope] | None = None, charge_state: ChargeState | None = None, group_ids: Iterable[str] = None, names: Dict[int, str] | None = None)[source]

Convert a sequence plus modifiers into formatted text following the ProForma specification.

Parameters:
  • sequence (list[tuple[str, TagBase]]) – The primary sequence of the peptidoform/proteoform to render

  • n_term (Optional[TagBase]) – The N-terminal modification, if any.

  • c_term (Optional[TagBase]) – The C-terminal modification, if any.

  • unlocalized_modifications (Optional[list[TagBase]]) – Any modifications which aren’t assigned to a specific location.

  • labile_modifications (Optional[list[TagBase]]) – Any labile modifications

  • fixed_modifications (Optional[list[ModificationRule]]) – Any fixed modifications

  • intervals (Optional[list[TaggedInterval]]) – A list of modified intervals, if any

  • isotopes (Optional[list[StableIsotope]]) – Any global stable isotope labels applied

  • charge_state (Optional[ChargeState]) – An optional charge state value

  • group_ids (Optional[list[str]]) – Any group identifiers. This parameter is currently not used.

Return type:

str

Helpers

pyteomics.proforma.set_unimod_path(path)[source]

Set the path to load the Unimod database from for resolving ProForma Unimod modifications.

Note

This method ensures that the Unimod modification database loads quickly from a local database file instead of downloading a new copy from the internet.

Parameters:

path (str or file-like object) – A path to or file-like object for the “unimod.xml” file.

Return type:

Unimod

High Level Interface

class pyteomics.proforma.ProForma(sequence, properties)[source]

Bases: object

Represent a parsed ProForma sequence.

The preferred way to instantiate this class is via the parse() method.

sequence

The list of (amino acid, tag collection) pairs making up the primary sequence of the peptide.

Type:

list[tuple[str, List[TagBase]]]

isotopes

A list of any stable isotope rules that apply to this peptide

Type:

list[StableIsotope]

charge_state

An optional charge state that may have been provided

Type:

int, optional

intervals

Any annotated intervals that contain either sequence ambiguity or a tag over that interval.

Type:

list[Interval]

labile_modifications

Any modifications that were parsed as labile, and may not appear at any location on the peptide primary sequence.

Type:

list[ModificationBase]

unlocalized_modifications

Any modifications that were not localized but may be attached to peptide sequence evidence.

Type:

list[ModificationBase]

n_term

Any modifications on the N-terminus of the peptide

Type:

list[ModificationBase]

c_term

Any modifications on the C-terminus of the peptide

Type:

list[ModificationBase]

group_ids

The collection of all groupd identifiers on this sequence.

Type:

set

mass

The computed mass for the fully modified peptide, including labile and unlocalized modifications. Does not include stable isotopes at this time

Type:

float

__init__(sequence, properties)[source]

Initialize a ProForma instance from a parse tree.

To construct an instance from a string directly, see ProForma.parse().

See also

ProForma.parse()

composition(include_charge: bool | ChargeState = False, aa_comp=None, ignore_missing=False) Composition[source]

Calculate the elemental composition of the ProForma sequence.

Parameters:
  • include_charge (bool or ChargeState, optional) – If True, then charge_state will be included in the composition. If a ChargeState instance is passed, this charge and adduction will be included instead. Otherwise, composition of the neutral molecule will be returned. Defaults to False.

  • aa_comp (dict, optional) – A dictionary mapping amino acid symbols to their respective compositions. If not provided, the standard amino acid composition will be used. X always has a mass of 0.0, regardless of this argument.

  • ignore_missing (bool, optional) –

    If True, tags with missing composition will be silently ignored. If False (default), a CompositionNotFoundError will be raised.

    Note

    Amino acids not found in aa_mass will result in errors even with ignore_missing=True.

Returns:

Composition object representing the composition of the ProForma sequence.

Return type:

Composition

find_tags_by_id(tag_id, include_position=True)[source]

Find all occurrences of a particular tag ID

Parameters:
  • tag_id (str) – The tag ID to search for

  • include_position (bool) – Whether or not to return the locations for matched tag positions

Return type:

list[tuple[Any, TagBase]] or list[TagBase]

fragments(ion_shift, charge=1, reverse=None, include_labile=True, include_unlocalized=True)[source]

The function generates all possible fragments of the requested series type.

Parameters:
  • ion_shift (float or str) – The mass shift of the ion series, or the name of the ion series

  • charge (int) – The charge state of the theoretical fragment masses to generate. Defaults to 1+. If 0 is passed, neutral masses will be returned.

  • reverse (bool, optional) – Whether to fragment from the N-terminus (False) or C-terminus (True). If ion_shift is a str, the terminal will be inferred from the series name. Otherwise, defaults to False.

  • include_labile (bool, optional) – Whether or not to include dissociated modification masses. Defaults to True

  • include_unlocalized (bool, optional) – Whether or not to include unlocalized modification masses. Defaults to True

Return type:

np.ndarray

Examples

>>> p = proforma.ProForma.parse("PEPTIDE")
>>> p.fragments('b', charge=1)
array([ 98.06004032, 227.1026334 , 324.15539725, 425.20307572,
        538.2871397 , 653.31408272])
>>> p.fragments('y', charge=1)
array([148.06043424, 263.08737726, 376.17144124, 477.21911971,
       574.27188356, 703.31447664])
property mass: float

Compute the total monoisotopic neutral mass of the peptidoform.

This does not include the adduct.

mz(charge: int | ChargeState | None = None, **kwargs) float[source]

Compute the total m/z of the peptidoform in the specified charge state, or fall back to the peptidoform ion’s defined charge state and adduction.

This method first tries to get the composition of the peptidoform ion with composition() and then forwards kwargs to Composition.mass() to compute m/z with full flexibility, but if that fails due to missing modification compositions, this method falls back to directly computing monoisotopic mass and uses the charge state to get the m/z.

Warning

If no charge state of any kind is available, this will raise a MissingChargeStateError.

Parameters:
  • charge (int or ChargeState, optional) – The charge state either as in integer number of protons gained/lost, or a ChargeState instance. If not provided, charge_state will be used.

  • **kwargs – Forwarded to Composition.mass()

Return type:

float

classmethod parse(string, **kwargs)[source]

Parse a ProForma string.

Parameters:
  • string (str) – The string to parse

  • **kwargs – Forwarded to Parser

Return type:

ProForma

peptidoforms(include_unmodified: bool = False, include_labile: bool = False, strip: bool = False, deepcopy: bool = False) Iterator[ProForma]

Generate combinatorial localizations of modifications defined on this ProForma sequence.

Parameters:
  • include_unmodified (bool) – For all non-fixed modifications, include the case where the modification is not included anywhere. This is equivalent to how variable modification rules are applied in search engines. It still respects the number of copies of modifications included in the input. See expand_rules.

  • include_labile (bool) – For all labile modifications, include the case where the modification is localized at every possible location or as a remaining labile modification.

  • strip (bool) – If True, the generated peptidoforms will have all modification tags stripped of any extra information, leaving only the bare modification definition.

  • deepcopy (bool) – If True, the generated peptidoforms will have all tags and modifications deep-copied. This is necessary if the generated peptidoforms will be modified in-place after generation, but adds overhead if they will be treated as immutable. Defaults to False.

Yields:

ProForma

proteoforms(include_unmodified: bool = False, include_labile: bool = False, strip: bool = False, deepcopy: bool = False) Iterator[ProForma][source]

Generate combinatorial localizations of modifications defined on this ProForma sequence.

Parameters:
  • include_unmodified (bool) – For all non-fixed modifications, include the case where the modification is not included anywhere. This is equivalent to how variable modification rules are applied in search engines. It still respects the number of copies of modifications included in the input. See expand_rules.

  • include_labile (bool) – For all labile modifications, include the case where the modification is localized at every possible location or as a remaining labile modification.

  • strip (bool) – If True, the generated peptidoforms will have all modification tags stripped of any extra information, leaving only the bare modification definition.

  • deepcopy (bool) – If True, the generated peptidoforms will have all tags and modifications deep-copied. This is necessary if the generated peptidoforms will be modified in-place after generation, but adds overhead if they will be treated as immutable. Defaults to False.

Yields:

ProForma

Tag Types

class pyteomics.proforma.TagBase(type, value, extra=None, group_id=None)[source]

Bases: object

A base class for all tag types.

type

An element of TagTypeEnum saying what kind of tag this is.

Type:

Enum

value

The data stored in this tag, usually an externally controlled name

Type:

object

extra

Any extra tags that were nested within this tag. Usually limited to INFO tags but may be other synonymous controlled vocabulary terms.

Type:

list

group_id

A short label denoting which group, if any, this tag belongs to

Type:

str or None

__init__(type, value, extra=None, group_id=None)[source]
find_tag_type(tag_type: TagTypeEnum) List[TagBase][source]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass() bool[source]

Check if this tag carries a mass value.

Return type:

bool

class pyteomics.proforma.TagTypeEnum(*values)[source]

Bases: Enum

Modification Tags

class pyteomics.proforma.MassModification(value, extra=None, group_id=None)[source]

Bases: TagBase

A modification defined purely by a signed mass shift in Daltons.

The value of a MassModification is always a float

__init__(value, extra=None, group_id=None)[source]
find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass() bool[source]

Check if this tag carries a mass value.

Return type:

bool

property key: ModificationToken

Get a safe-to-hash-and-compare ModificationToken representing this modification without tag-like properties.

Return type:

ModificationToken

class pyteomics.proforma.ModificationBase(value, extra=None, group_id=None, style=None)[source]

Bases: TagBase

A base class for all modification tags with marked prefixes.

While ModificationBase is hashable, its equality testing brings in additional tag-related information. For pure modification identity comparison, use key to get a ModificationToken free of these concerns.

__init__(value, extra=None, group_id=None, style=None)[source]
property composition: Composition | None

The chemical composition shift this modification applies

property definition: Dict[str, Any]

A dict of properties describing this modification, given by the providing controlled vocabulary. This value is cached, and should not be modified.

Return type:

dict

find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass()[source]

Check if this tag carries a mass value.

Return type:

bool

property id: int | None

The unique identifier given to this modification by its provider

Return type:

str or int

property key: ModificationToken

Get a safe-to-hash-and-compare ModificationToken representing this modification without tag-like properties.

Return type:

ModificationToken

property mass

The monoisotopic mass shift this modification applies

Return type:

float

property name

The primary name of this modification from its provider.

Return type:

str

property provider

The name of the controlled vocabulary that provided this modification.

Return type:

str

resolve()[source]

Find the term and return it’s properties

class pyteomics.proforma.GenericModification(value, extra=None, group_id=None, style=None)[source]

Bases: ModificationBase

__init__(value, extra=None, group_id=None, style=None)[source]
property composition: Composition | None

The chemical composition shift this modification applies

property definition: Dict[str, Any]

A dict of properties describing this modification, given by the providing controlled vocabulary. This value is cached, and should not be modified.

Return type:

dict

find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass()

Check if this tag carries a mass value.

Return type:

bool

property id: int | None

The unique identifier given to this modification by its provider

Return type:

str or int

property key: ModificationToken

Get a safe-to-hash-and-compare ModificationToken representing this modification without tag-like properties.

Return type:

ModificationToken

property mass

The monoisotopic mass shift this modification applies

Return type:

float

property name

The primary name of this modification from its provider.

Return type:

str

property provider

The name of the controlled vocabulary that provided this modification.

Return type:

str

resolve()[source]

Find the term, searching through all available vocabularies and return the first match’s properties

class pyteomics.proforma.FormulaModification(value, extra=None, group_id=None, style=None)[source]

Bases: ModificationBase

__init__(value, extra=None, group_id=None, style=None)
property composition: Composition | None

The chemical composition shift this modification applies

property definition: Dict[str, Any]

A dict of properties describing this modification, given by the providing controlled vocabulary. This value is cached, and should not be modified.

Return type:

dict

find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass()

Check if this tag carries a mass value.

Return type:

bool

property id: int | None

The unique identifier given to this modification by its provider

Return type:

str or int

property key: ModificationToken

Get a safe-to-hash-and-compare ModificationToken representing this modification without tag-like properties.

Return type:

ModificationToken

property mass

The monoisotopic mass shift this modification applies

Return type:

float

property name

The primary name of this modification from its provider.

Return type:

str

property provider

The name of the controlled vocabulary that provided this modification.

Return type:

str

resolve()[source]

Find the term and return it’s properties

class pyteomics.proforma.UnimodModification(value, extra=None, group_id=None, style=None)[source]

Bases: ModificationBase

__init__(value, extra=None, group_id=None, style=None)
property composition: Composition | None

The chemical composition shift this modification applies

property definition: Dict[str, Any]

A dict of properties describing this modification, given by the providing controlled vocabulary. This value is cached, and should not be modified.

Return type:

dict

find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass()

Check if this tag carries a mass value.

Return type:

bool

property id: int | None

The unique identifier given to this modification by its provider

Return type:

str or int

property key: ModificationToken

Get a safe-to-hash-and-compare ModificationToken representing this modification without tag-like properties.

Return type:

ModificationToken

property mass

The monoisotopic mass shift this modification applies

Return type:

float

property name

The primary name of this modification from its provider.

Return type:

str

property provider

The name of the controlled vocabulary that provided this modification.

Return type:

str

resolve()

Find the term and return it’s properties

class pyteomics.proforma.PSIModModification(value, extra=None, group_id=None, style=None)[source]

Bases: ModificationBase

__init__(value, extra=None, group_id=None, style=None)
property composition: Composition | None

The chemical composition shift this modification applies

property definition: Dict[str, Any]

A dict of properties describing this modification, given by the providing controlled vocabulary. This value is cached, and should not be modified.

Return type:

dict

find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass()

Check if this tag carries a mass value.

Return type:

bool

property id: int | None

The unique identifier given to this modification by its provider

Return type:

str or int

property key: ModificationToken

Get a safe-to-hash-and-compare ModificationToken representing this modification without tag-like properties.

Return type:

ModificationToken

property mass

The monoisotopic mass shift this modification applies

Return type:

float

property name

The primary name of this modification from its provider.

Return type:

str

property provider

The name of the controlled vocabulary that provided this modification.

Return type:

str

resolve()

Find the term and return it’s properties

class pyteomics.proforma.XLMODModification(value, extra=None, group_id=None, style=None)[source]

Bases: ModificationBase

__init__(value, extra=None, group_id=None, style=None)
property composition: Composition | None

The chemical composition shift this modification applies

property definition: Dict[str, Any]

A dict of properties describing this modification, given by the providing controlled vocabulary. This value is cached, and should not be modified.

Return type:

dict

find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass()

Check if this tag carries a mass value.

Return type:

bool

property id: int | None

The unique identifier given to this modification by its provider

Return type:

str or int

property key: ModificationToken

Get a safe-to-hash-and-compare ModificationToken representing this modification without tag-like properties.

Return type:

ModificationToken

property mass

The monoisotopic mass shift this modification applies

Return type:

float

property name

The primary name of this modification from its provider.

Return type:

str

property provider

The name of the controlled vocabulary that provided this modification.

Return type:

str

resolve()

Find the term and return it’s properties

class pyteomics.proforma.GNOmeModification(value, extra=None, group_id=None, style=None)[source]

Bases: ModificationBase

__init__(value, extra=None, group_id=None, style=None)
property composition: Composition | None

The chemical composition shift this modification applies

property definition: Dict[str, Any]

A dict of properties describing this modification, given by the providing controlled vocabulary. This value is cached, and should not be modified.

Return type:

dict

find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass()

Check if this tag carries a mass value.

Return type:

bool

property id: int | None

The unique identifier given to this modification by its provider

Return type:

str or int

property key: ModificationToken

Get a safe-to-hash-and-compare ModificationToken representing this modification without tag-like properties.

Return type:

ModificationToken

property mass

The monoisotopic mass shift this modification applies

Return type:

float

property name

The primary name of this modification from its provider.

Return type:

str

property provider

The name of the controlled vocabulary that provided this modification.

Return type:

str

resolve()

Find the term and return it’s properties

class pyteomics.proforma.GlycanModification(value, extra=None, group_id=None, style=None)[source]

Bases: ModificationBase

__init__(value, extra=None, group_id=None, style=None)
property composition: Composition | None

The chemical composition shift this modification applies

property definition: Dict[str, Any]

A dict of properties describing this modification, given by the providing controlled vocabulary. This value is cached, and should not be modified.

Return type:

dict

find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass()

Check if this tag carries a mass value.

Return type:

bool

property id: int | None

The unique identifier given to this modification by its provider

Return type:

str or int

property key: ModificationToken

Get a safe-to-hash-and-compare ModificationToken representing this modification without tag-like properties.

Return type:

ModificationToken

property mass

The monoisotopic mass shift this modification applies

Return type:

float

property name

The primary name of this modification from its provider.

Return type:

str

property provider

The name of the controlled vocabulary that provided this modification.

Return type:

str

resolve()[source]

Find the term and return it’s properties

class pyteomics.proforma.ModificationToken(name: str, id: int, provider: Callable, source_cls: Type)[source]

Bases: object

Describes a particular modification from a particular provider, independent of a TagBase’s state.

This class is meant to be used in place of a ModificationBase object when equality testing and hashing is desired, but do not want extra properties to be involved.

ModificationToken is comparable and hashable, and can be compared with ModificationBase subclass instances safely. It can be called to create a new instance of the ModificationBase it is equal to.

name

The name of the modification being represented, as the user specified it.

Type:

str

id

Whatever unique identifier the providing controlled vocabulary gave to this modification

Type:

int or str

provider

The name of the providing controlled vocabulary.

Type:

str

source_cls

A sub-class of ModificationBase that will be used to fulfill this token if requested, providing it a resolver.

Type:

type

__init__(name: str, id: int, provider: Callable, source_cls: Type)[source]

Label Tags

class pyteomics.proforma.InformationTag(value, extra=None, group_id=None)[source]

Bases: TagBase

A tag carrying free text describing the location

__init__(value, extra=None, group_id=None)[source]
find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass() bool

Check if this tag carries a mass value.

Return type:

bool

class pyteomics.proforma.PositionLabelTag(value=None, extra=None, group_id=None)[source]

Bases: GroupLabelBase

A tag to mark that a position is involved in a group in some way, but does not imply any specific semantics.

__init__(value=None, extra=None, group_id=None)[source]
find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass() bool

Check if this tag carries a mass value.

Return type:

bool

class pyteomics.proforma.LocalizationMarker(value, extra=None, group_id=None)[source]

Bases: GroupLabelBase

A tag to mark a particular localization site

__init__(value, extra=None, group_id=None)[source]
find_tag_type(tag_type: TagTypeEnum) List[TagBase]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:

tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.

Returns:

matches – The list of all tags in this object which match the requested tag type.

Return type:

list

has_mass() bool

Check if this tag carries a mass value.

Return type:

bool

Supporting Types

class pyteomics.proforma.ModificationRule(modification_tag: TagBase, targets: ModificationTarget | List[ModificationTarget] | List[str] | None = None)[source]

Bases: object

Define a fixed modification rule which dictates a modification tag is always applied at one or more amino acid residues.

modification_tag

The modification to apply

Type:

TagBase

targets

The list of amino acids this applies to

Type:

list

__init__(modification_tag: TagBase, targets: ModificationTarget | List[ModificationTarget] | List[str] | None = None)[source]
is_not_specific() bool[source]

If there are no explicit targets, this rule might apply everywhere

class pyteomics.proforma.StableIsotope(isotope)[source]

Bases: object

Define a fixed isotope that is applied globally to all amino acids.

isotope

The stable isotope string, of the form [<isotope-number>]<element> or a special isotopoform’s name.

Type:

str

__init__(isotope)[source]
class pyteomics.proforma.TaggedInterval(start, end=None, tags=None, ambiguous=False)[source]

Bases: object

Define a fixed interval over the associated sequence which contains the localization of the associated tag or denotes a region of general sequence order ambiguity.

start

The starting position (inclusive) of the interval along the primary sequence

Type:

int

end

The ending position (exclusive) of the interval along the primary sequence

Type:

int

tags

The tags being localized

Type:

list[TagBase]

ambiguous

Whether the interval is ambiguous or not

Type:

bool

__init__(start, end=None, tags=None, ambiguous=False)[source]
class pyteomics.proforma.ChargeState(charge, adducts=None)[source]

Bases: object

Describes the charge and adduct types of the structure.

charge

The total charge state as a signed number.

Type:

int

adducts

Each charge carrier associated with the molecule.

Type:

list[Adduct]

__init__(charge, adducts=None)[source]
for_mz_calculation() Tuple[float, int][source]

Get the total mass of the charge carrier(s) and their collective charge to plug into the formula for mass-to-charge-ratio, (mass of molecule + mass of charge carrier) / charge

Returns:

  • charge_carrier_mass (float) – The total mass of the charge carriers(s) in the adducting group(s)

  • charge (int) – The total charge contributed by all the charge carriers in the adducting group(s)

Modification Resolvers

class pyteomics.proforma.ModificationResolver(name, **kwargs)[source]

Bases: object

__init__(name, **kwargs)[source]
clear_cache()[source]

Clear the modification definition cache

enable_caching(flag: bool = True)[source]

Enable or disable caching of modification definitions.

If flag is False, this will also dispose of any existing cached values.

Parameters:

flag (bool) – Whether or not to disable the cache

parse_identifier(identifier: str) Tuple[str | None, int | None][source]

Parse a string that is either a CV prefixed identifier or name.

Parameters:

identifier (str) – The identifier string to parse, removing CV prefix as needed.

Returns:

  • name (str, optional) – A textual identifier embedded in the qualified identifier, if any, otherwise None.

  • id (int, optional) – An integer ID embedded in the qualified identifier, if any, otherwise None.

class pyteomics.proforma.GenericResolver(resolvers, **kwargs)[source]

Bases: ModificationResolver

__init__(resolvers, **kwargs)[source]
clear_cache()

Clear the modification definition cache

enable_caching(flag: bool = True)

Enable or disable caching of modification definitions.

If flag is False, this will also dispose of any existing cached values.

Parameters:

flag (bool) – Whether or not to disable the cache

parse_identifier(identifier)[source]

Parse a string that is either a CV prefixed identifier or name.

Does no parsing as a GenericModification is never qualified.

Parameters:

identifier (str) – The identifier string to parse, removing CV prefix as needed.

Returns:

  • name (str, optional) – A textual identifier embedded in the qualified identifier, if any, otherwise None.

  • id (int, optional) – An integer ID embedded in the qualified identifier, if any, otherwise None.

class pyteomics.proforma.UnimodResolver(**kwargs)[source]

Bases: ModificationResolver

__init__(**kwargs)[source]
clear_cache()

Clear the modification definition cache

enable_caching(flag: bool = True)

Enable or disable caching of modification definitions.

If flag is False, this will also dispose of any existing cached values.

Parameters:

flag (bool) – Whether or not to disable the cache

parse_identifier(identifier: str) Tuple[str | None, int | None]

Parse a string that is either a CV prefixed identifier or name.

Parameters:

identifier (str) – The identifier string to parse, removing CV prefix as needed.

Returns:

  • name (str, optional) – A textual identifier embedded in the qualified identifier, if any, otherwise None.

  • id (int, optional) – An integer ID embedded in the qualified identifier, if any, otherwise None.

class pyteomics.proforma.PSIModResolver(**kwargs)[source]

Bases: ModificationResolver

__init__(**kwargs)[source]
clear_cache()

Clear the modification definition cache

enable_caching(flag: bool = True)

Enable or disable caching of modification definitions.

If flag is False, this will also dispose of any existing cached values.

Parameters:

flag (bool) – Whether or not to disable the cache

parse_identifier(identifier: str) Tuple[str | None, int | None]

Parse a string that is either a CV prefixed identifier or name.

Parameters:

identifier (str) – The identifier string to parse, removing CV prefix as needed.

Returns:

  • name (str, optional) – A textual identifier embedded in the qualified identifier, if any, otherwise None.

  • id (int, optional) – An integer ID embedded in the qualified identifier, if any, otherwise None.

class pyteomics.proforma.XLMODResolver(**kwargs)[source]

Bases: ModificationResolver

__init__(**kwargs)[source]
clear_cache()

Clear the modification definition cache

enable_caching(flag: bool = True)

Enable or disable caching of modification definitions.

If flag is False, this will also dispose of any existing cached values.

Parameters:

flag (bool) – Whether or not to disable the cache

parse_identifier(identifier: str) Tuple[str | None, int | None]

Parse a string that is either a CV prefixed identifier or name.

Parameters:

identifier (str) – The identifier string to parse, removing CV prefix as needed.

Returns:

  • name (str, optional) – A textual identifier embedded in the qualified identifier, if any, otherwise None.

  • id (int, optional) – An integer ID embedded in the qualified identifier, if any, otherwise None.

class pyteomics.proforma.GNOResolver(**kwargs)[source]

Bases: ModificationResolver

__init__(**kwargs)[source]
clear_cache()

Clear the modification definition cache

enable_caching(flag: bool = True)

Enable or disable caching of modification definitions.

If flag is False, this will also dispose of any existing cached values.

Parameters:

flag (bool) – Whether or not to disable the cache

get_mass_from_glycan_composition(term)[source]

Parse the Byonic-style glycan composition from property GNO:00000202 to get the counts of each monosaccharide and use that to calculate mass.

The mass computed here is exact and dehydrated, distinct from the rounded-off mass that get_mass_from_term() will produce by walking up the CV term hierarchy. However, not all glycan compositions are representable in GNO:00000202 format, so this may silently be absent or incomplete, hence the double-check in get_mass_from_term().

Parameters:

term (psims.controlled_vocabulary.Entity) – The CV entity being parsed.

Returns:

mass – If a glycan composition is found on the term, the computed mass will be returned. Otherwise the None is returned

Return type:

float or None

get_mass_from_term(term, raw_mass)[source]

Walk up the term hierarchy and find the mass group term near the root of the tree, and return the most accurate mass available for the provided term.

The mass group term’s mass is rounded to two decimal places, leading to relatively large errors.

Parameters:

term (psims.controlled_vocabulary.Entity) – The CV entity being parsed.

Returns:

mass – If a root node is found along the term’s lineage, computed mass will be returned. Otherwise the None is returned. The mass may be

Return type:

float or None

parse_identifier(identifier: str) Tuple[str | None, int | None]

Parse a string that is either a CV prefixed identifier or name.

Parameters:

identifier (str) – The identifier string to parse, removing CV prefix as needed.

Returns:

  • name (str, optional) – A textual identifier embedded in the qualified identifier, if any, otherwise None.

  • id (int, optional) – An integer ID embedded in the qualified identifier, if any, otherwise None.

«  usi - Universal Spectrum Identifier (USI) parser and minimal PROXI client   ::   Contents   ::   featurexml - reader for featureXML files  »