Pyteomics documentation v4.5

proforma - Proteoform and Peptidoform Notation

«  usi - Universal Spectrum Identifier (USI) parser and minimal PROXI client   ::   Contents   ::   featurexml - reader for featureXML files  »

proforma - Proteoform and Peptidoform Notation

ProForma is a notation for defining modified amino acid sequences using a set of controlled vocabularies, as well as encoding uncertain or partial information about localization. See ProForma specification for more up-to-date information.

Strictly speaking, this implementation supports ProForma v2.

Data Access

parse() - The primary interface for parsing ProForma strings.

>>> parse("EM[Oxidation]EVT[#g1(0.01)]S[#g1(0.09)]ES[Phospho#g1(0.90)]PEK")
    ([('E', None),
      ('M', [GenericModification('Oxidation', None, None)]),
      ('E', None),
      ('V', None),
      ('T', [LocalizationMarker(0.01, None, '#g1')]),
      ('S', [LocalizationMarker(0.09, None, '#g1')]),
      ('E', None),
      ('S',
      [GenericModification('Phospho', [LocalizationMarker(0.9, None, '#g1')], '#g1')]),
      ('P', None),
      ('E', None),
      ('K', None)],
     {'n_term': None,
      'c_term': None,
      'unlocalized_modifications': [],
      'labile_modifications': [],
      'fixed_modifications': [],
      'intervals': [],
      'isotopes': [],
      'group_ids': ['#g1']})

to_proforma() - Format a sequence and set of properties as ProForma text.

Classes

ProForma - An object oriented version of the parsing and formatting code, coupled with minimal information about mass and position data.

Dependencies

To resolve PSI-MOD, XL-MOD, and GNO identifiers, psims is required. By default, psims retrieves the most recent version of each ontology from the internet, but includes a fall-back version to use when the network is unavailable. It can also create an application cache on disk

Compliance Levels

1. Base Level Support Represents the lowest level of compliance, this level involves providing support for:

  • [x] Amino acid sequences
  • [x] Protein modifications using two of the supported CVs/ontologies: Unimod and PSI-MOD.
  • [x] Protein modifications using delta masses (without prefixes)
  • [x] N-terminal, C-terminal and labile modifications.
  • [x] Ambiguity in the modification position, including support for localisation scores.
  • [x] INFO tag.

2. Additional Separate Support These features are independent from each other:

  • [x] Unusual amino acids (O and U).
  • [x] Ambiguous amino acids (e.g. X, B, Z). This would include support for sequence tags of known mass (using the character X).
  • [x] Protein modifications using delta masses (using prefixes for the different CVs/ontologies).
  • [x] Use of prefixes for Unimod (U:) and PSI-MOD (M:) names.
  • [x] Support for the joint representation of experimental data and its interpretation.
  1. Top Down Extensions

    • [ ] Additional CV/ontologies for protein modifications: RESID (the prefix R MUST be used for RESID CV/ontology term names)
    • [x] Chemical formulas (this feature occurs in two places in this list).
  2. Cross-Linking Extensions

    • [ ] Cross-linked peptides (using the XL-MOD CV/ontology, the prefix X MUST be used for XL-MOD CV/ontology term names).
  3. Glycan Extensions

    • [x] Additional CV/ontologies for protein modifications: GNO (the prefix G MUST be used for GNO CV/ontology term names)
    • [x] Glycan composition.
    • [x] Chemical formulas (this feature occurs in two places in this list).
  4. Spectral Support

    • [x] Charge state and adducts
    • [ ] Chimeric spectra are special cases.
    • [x] Global modifications (e.g., every C is C13).
class pyteomics.proforma.ChargeState(charge, adducts=None)[source]

Bases: object

Describes the charge and adduct types of the structure.

charge

The total charge state as a signed number.

Type:int
adducts

Each charge carrier associated with the molecule.

Type:list[str]
__init__(charge, adducts=None)[source]

Initialize self. See help(type(self)) for accurate signature.

class pyteomics.proforma.FormulaModification(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.ModificationBase

__init__(value, extra=None, group_id=None)

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
resolve()[source]

Find the term and return it’s properties

class pyteomics.proforma.GNOmeModification(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.ModificationBase

__init__(value, extra=None, group_id=None)

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
resolve()

Find the term and return it’s properties

class pyteomics.proforma.GenericModification(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.ModificationBase

__init__(value, extra=None, group_id=None)[source]

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
resolve()[source]

Find the term, searching through all available vocabularies and return the first match’s properties

class pyteomics.proforma.GlycanModification(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.ModificationBase

__init__(value, extra=None, group_id=None)

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
resolve()[source]

Find the term and return it’s properties

class pyteomics.proforma.GroupLabelBase(type, value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.TagBase

__init__(type, value, extra=None, group_id=None)

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
class pyteomics.proforma.InformationTag(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.TagBase

A tag carrying free text describing the location

__init__(value, extra=None, group_id=None)[source]

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
class pyteomics.proforma.IntersectionEnum[source]

Bases: enum.Enum

An enumeration.

class pyteomics.proforma.LocalizationMarker(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.GroupLabelBase

A tag to mark a particular localization site

__init__(value, extra=None, group_id=None)[source]

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
class pyteomics.proforma.MassModification(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.TagBase

A modification defined purely by a signed mass shift in Daltons.

The value of a MassModification is always a float

__init__(value, extra=None, group_id=None)[source]

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
class pyteomics.proforma.ModificationBase(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.TagBase

A base class for all modification tags with marked prefixes.

__init__(value, extra=None, group_id=None)[source]

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
resolve()[source]

Find the term and return it’s properties

class pyteomics.proforma.ModificationRule(modification_tag, targets=None)[source]

Bases: object

Define a fixed modification rule which dictates a modification tag is always applied at one or more amino acid residues.

modification_tag

The modification to apply

Type:TagBase
targets

The list of amino acids this applies to

Type:list
__init__(modification_tag, targets=None)[source]

Initialize self. See help(type(self)) for accurate signature.

class pyteomics.proforma.NumberParser(initial=None)[source]

Bases: pyteomics.proforma.TokenBuffer

A buffer which accumulates tokens until it is asked to parse them into int instances.

__init__(initial=None)

Initialize self. See help(type(self)) for accurate signature.

append(c)

Append a new character to the buffer.

Parameters:c (str) – The character appended
reset()

Discard the content of the current buffer.

class pyteomics.proforma.PSIModModification(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.ModificationBase

__init__(value, extra=None, group_id=None)

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
resolve()

Find the term and return it’s properties

class pyteomics.proforma.ParserStateEnum[source]

Bases: enum.Enum

An enumeration.

class pyteomics.proforma.PositionLabelTag(value=None, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.GroupLabelBase

A tag to mark that a position is involved in a group in some way, but does not imply any specific semantics.

__init__(value=None, extra=None, group_id=None)[source]

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
class pyteomics.proforma.PrefixSavingMeta[source]

Bases: type

A subclass-registering-metaclass that provides easy lookup of subclasses by prefix attributes.

__init__

Initialize self. See help(type(self)) for accurate signature.

mro()

Return a type’s method resolution order.

class pyteomics.proforma.ProForma(sequence, properties)[source]

Bases: object

Represent a parsed ProForma sequence.

sequence

The list of (amino acid, tag collection) pairs making up the primary sequence of the peptide.

Type:list[tuple[]]
isotopes

A list of any stable isotope rules that apply to this peptide

Type:list[StableIsotope]
charge_state

An optional charge state that may have been provided

Type:int, optional
intervals

Any annotated intervals that contain either sequence ambiguity or a tag over that interval.

Type:list[Interval]
labile_modifications

Any modifications that were parsed as labile, and may not appear at any location on the peptide primary sequence.

Type:list[ModificationBase]
unlocalized_modifications

Any modifications that were not localized but may be attached to peptide sequence evidence.

Type:list[ModificationBase]
n_term

Any modifications on the N-terminus of the peptide

Type:list[ModificationBase]
c_term

Any modifications on the C-terminus of the peptide

Type:list[ModificationBase]
group_ids

The collection of all groupd identifiers on this sequence.

Type:set
mass

The computed mass for the fully modified peptide, including labile and unlocalized modifications. Does not include stable isotopes at this time

Type:float
__init__(sequence, properties)[source]

Initialize self. See help(type(self)) for accurate signature.

find_tags_by_id(tag_id, include_position=True)[source]

Find all occurrences of a particular

exception pyteomics.proforma.ProFormaError(message, index=None, parser_state=None, **kwargs)[source]

Bases: pyteomics.auxiliary.structures.PyteomicsError

__init__(message, index=None, parser_state=None, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class pyteomics.proforma.StableIsotope(isotope)[source]

Bases: object

Define a fixed isotope that is applied globally to all amino acids.

isotope

The stable isotope string, of the form [<isotope-number>]<element> or a special isotopoform’s name.

Type:str
__init__(isotope)[source]

Initialize self. See help(type(self)) for accurate signature.

class pyteomics.proforma.StringParser(initial=None)[source]

Bases: pyteomics.proforma.TokenBuffer

A buffer which accumulates tokens until it is asked to parse them into str instances.

__init__(initial=None)

Initialize self. See help(type(self)) for accurate signature.

append(c)

Append a new character to the buffer.

Parameters:c (str) – The character appended
reset()

Discard the content of the current buffer.

class pyteomics.proforma.TagBase(type, value, extra=None, group_id=None)[source]

Bases: object

A base class for all tag types.

type

An element of TagTypeEnum saying what kind of tag this is.

Type:Enum
value

The data stored in this tag, usually an externally controlled name

Type:object
extra

Any extra tags that were nested within this tag. Usually limited to INFO tags but may be other synonymous controlled vocabulary terms.

Type:list
group_id

A short label denoting which group, if any, this tag belongs to

Type:str or None
__init__(type, value, extra=None, group_id=None)[source]

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)[source]

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
class pyteomics.proforma.TagParser(initial=None, group_ids=None)[source]

Bases: pyteomics.proforma.TokenBuffer

A buffer which accumulates tokens until it is asked to parse them into TagBase instances.

Implements a subset of the Sequence protocol.

buffer

The list of tokens accumulated since the last parsing.

Type:list
group_ids

The set of all group IDs that have been produced so far.

Type:set
__init__(initial=None, group_ids=None)[source]

Initialize self. See help(type(self)) for accurate signature.

append(c)

Append a new character to the buffer.

Parameters:c (str) – The character appended
reset()

Discard the content of the current buffer.

class pyteomics.proforma.TagTypeEnum[source]

Bases: enum.Enum

An enumeration.

class pyteomics.proforma.TaggedInterval(start, end=None, tags=None, ambiguous=False)[source]

Bases: object

Define a fixed interval over the associated sequence which contains the localization of the associated tag or denotes a region of general sequence order ambiguity.

start

The starting position (inclusive) of the interval along the primary sequence

Type:int
end

The ending position (exclusive) of the interval along the primary sequence

Type:int
tags

The tags being localized

Type:list[TagBase]
ambiguous

Whether the interval is ambiguous or not

Type:bool
__init__(start, end=None, tags=None, ambiguous=False)[source]

Initialize self. See help(type(self)) for accurate signature.

class pyteomics.proforma.TokenBuffer(initial=None)[source]

Bases: object

A token buffer that wraps the accumulation and reset logic of a list of str objects.

Implements a subset of the Sequence protocol.

buffer

The list of tokens accumulated since the last parsing.

Type:list
__init__(initial=None)[source]

Initialize self. See help(type(self)) for accurate signature.

append(c)[source]

Append a new character to the buffer.

Parameters:c (str) – The character appended
reset()[source]

Discard the content of the current buffer.

class pyteomics.proforma.UnimodModification(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.ModificationBase

__init__(value, extra=None, group_id=None)

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
resolve()

Find the term and return it’s properties

class pyteomics.proforma.XLMODModification(value, extra=None, group_id=None)[source]

Bases: pyteomics.proforma.ModificationBase

__init__(value, extra=None, group_id=None)

Initialize self. See help(type(self)) for accurate signature.

find_tag_type(tag_type)

Search this tag or tag collection for elements with a particular tag type and return them.

Parameters:tag_type (TagTypeEnum) – A label from TagTypeEnum, or an equivalent type.
Returns:matches – The list of all tags in this object which match the requested tag type.
Return type:list
resolve()

Find the term and return it’s properties

pyteomics.proforma.find_prefix(tokens)[source]

Find the prefix, if any of the tag defined by tokens delimited by “:”.

Parameters:tokens (list) – The tag tokens to search
Returns:
  • prefix (str or None) – The prefix string, if found
  • rest (str) – The rest of the tokens, merged as a string
pyteomics.proforma.parse(sequence)[source]

Tokenize a ProForma sequence into a sequence of amino acid+tag positions, and a mapping of sequence-spanning modifiers.

Note

This is a state machine parser, but with certain sub-state paths unrolled to avoid an explosion of formal intermediary states.

Parameters:sequence (str) – The sequence to parse
Returns:
  • parsed_sequence (list[tuple[str, list[TagBase]]]) – The (amino acid: str, TagBase or None) pairs denoting the positions along the primary sequence
  • modifiers (dict) – A mapping listing the labile modifications, fixed modifications, stable isotopes, unlocalized modifications, tagged intervals, and group IDs
pyteomics.proforma.process_marker(tokens)[source]

Process a marker, which is a tag whose value starts with #.

Parameters:tokens (list) – The tag tokens to parse
Returns:
Return type:PositionLabelTag or LocalizationMarker
pyteomics.proforma.process_tag_tokens(tokens)[source]

Convert a tag token buffer into a parsed TagBase instance of the appropriate sub-type with zero or more sub-tags.

Parameters:tokens (list) – The tokens to parse
Returns:The parsed tag
Return type:TagBase
pyteomics.proforma.split_tags(tokens)[source]

Split a token array into discrete sets of tag tokens.

Parameters:tokens (list) – The characters of the tag token buffer
Returns:The tokens for each contained tag
Return type:list of list
pyteomics.proforma.to_proforma(sequence, n_term=None, c_term=None, unlocalized_modifications=None, labile_modifications=None, fixed_modifications=None, intervals=None, isotopes=None, charge_state=None, group_ids=None)[source]

Convert a sequence plus modifiers into formatted text following the ProForma specification.

Parameters:
  • sequence (list[tuple[str, TagBase]]) – The primary sequence of the peptidoform/proteoform to render
  • n_term (Optional[TagBase]) – The N-terminal modification, if any.
  • c_term (Optional[TagBase]) – The C-terminal modification, if any.
  • unlocalized_modifications (Optional[list[TagBase]]) – Any modifications which aren’t assigned to a specific location.
  • labile_modifications (Optional[list[TagBase]]) – Any labile modifications
  • fixed_modifications (Optional[list[ModificationRule]]) – Any fixed modifications
  • intervals (Optional[list[TaggedInterval]]) – A list of modified intervals, if any
  • isotopes (Optional[list[StableIsotope]]) – Any global stable isotope labels applied
  • charge_state (Optional[ChargeState]) – An optional charge state value
  • group_ids (Optional[list[str]]) – Any group identifiers. This parameter is currently not used.
Returns:

Return type:

str

«  usi - Universal Spectrum Identifier (USI) parser and minimal PROXI client   ::   Contents   ::   featurexml - reader for featureXML files  »