Pyteomics documentation v4.7.1

mass - molecular masses and isotope distributions

«  parser - operations on modX peptide sequences   ::   Contents   ::   unimod - interface to the Unimod database  »

mass - molecular masses and isotope distributions

Summary

This module defines general functions for mass and isotope abundance calculations. For most of the functions, the user can define a given substance in various formats, but all of them would be reduced to the Composition object describing its chemical composition.

Classes

Composition - a class storing chemical composition of a substance.

Unimod - a class representing a Python interface to the Unimod database (see pyteomics.mass.unimod for a much more powerful alternative).

Mass calculations

calculate_mass() - a general routine for mass / m/z calculation. Can calculate mass for a polypeptide sequence, chemical formula or elemental composition. Supplied with an ion type and charge, the function would calculate m/z.

fast_mass() - a less powerful but much faster function for polypeptide mass calculation.

fast_mass2() - a version of fast_mass that supports modX notation.

Isotopic abundances

isotopic_composition_abundance() - calculate the relative abundance of a given isotopic composition.

most_probable_isotopic_composition() - finds the most abundant isotopic composition for a molecule defined by a polypeptide sequence, chemical formula or elemental composition.

isotopologues() - iterate over possible isotopic conposition of a molecule, possibly filtered by abundance.

Data

nist_mass - a dict with exact masses of the most abundant isotopes.

std_aa_comp - a dict with the elemental compositions of the standard twenty amino acid residues, selenocysteine and pyrrolysine.

std_ion_comp - a dict with the relative elemental compositions of the standard peptide fragment ions.

std_aa_mass - a dict with the monoisotopic masses of the standard twenty amino acid residues, selenocysteine and pyrrolysine.


Composition.__init__(*args, **kwargs)[source]

A Composition object stores a chemical composition of a substance. Basically it is a dict object, in which keys are the names of chemical elements and values contain integer numbers of corresponding atoms in a substance.

The main improvement over dict is that Composition objects allow addition and subtraction.

A Composition object can be initialized with one of the following arguments: formula, sequence, parsed_sequence or split_sequence.

If none of these are specified, the constructor will look at the first positional argument and try to build the object from it. Without positional arguments, a Composition will be constructed directly from keyword arguments.

If there’s an ambiguity, i.e. the argument is both a valid sequence and a formula (such as ‘HCN’), it will be treated as a sequence. You need to provide the ‘formula’ keyword to override this.

Warning

Be careful when supplying a list with a parsed sequence or a split sequence as a keyword argument. It must be obtained with enabled show_unmodified_termini option. When supplying it as a positional argument, the option doesn’t matter, because the positional argument is always converted to a sequence prior to any processing.

Parameters:
  • formula (str, optional) – A string with a chemical formula. All elements must be present in mass_data.

  • sequence (str, optional) – A polypeptide sequence string in modX notation.

  • parsed_sequence (list of str, optional) – A polypeptide sequence parsed into a list of amino acids.

  • split_sequence (list of tuples of str, optional) – A polypeptyde sequence parsed into a list of tuples (as returned be pyteomics.parser.parse() with split=True).

  • aa_comp (dict, optional) – A dict with the elemental composition of the amino acids (the default value is std_aa_comp).

  • ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

  • ion_type (str, optional) – If specified, then the polypeptide is considered to be in the form of the corresponding ion.

Composition.mass(**kwargs)[source]

Calculate the mass or m/z of a Composition.

Parameters:
  • average (bool, optional) – If True then the average mass is calculated. Note that mass is not averaged for elements with specified isotopes. Default is False.

  • charge (int, optional) – If not 0 then m/z is calculated. See also: charge_carrier.

  • charge_carrier (str or dict, optional) –

    Chemical group carrying the charge. Defaults to a proton, “H+”. If string, must be a chemical formula, as supported by the Composition formula argument, except it must end with a charge formatted as “[+-][N]”. If N is omitted, single charge is assumed. Examples of charge_carrier: “H+”, “NH3+” (here, 3 is part of the composition, and + is a single charge), “Fe+2” (“Fe” is the formula and “+2” is the charge). .. note :: charge must be a multiple of charge_carrier charge.

    If dict, it is the atomic composition of the group. In this case, the charge can be passed separately as carrier_charge or it will be deduced from the number of protons in charge_carrier.

  • carrier_charge (int, optional) –

    Charge of the charge carrier group (if charge_carrier is specified as a composition dict).

    Note

    charge must be a multiple of charge_charge.

  • mass_data (dict, optional) – A dict with the masses of the chemical elements (the default value is nist_mass).

  • ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

  • ion_type (str, optional) – If specified, then the polypeptide is considered to be in the form of the corresponding ion. Do not forget to specify the charge state!

  • absolute (bool, optional) –

    If True (default), the m/z value returned will always be positive, even for negatively charged ions.

    Note

    absolute only applies when charge is negative. The mass can still be negative for negative compositions.

Returns:

mass

Return type:

float

class pyteomics.mass.mass.Unimod(source='http://www.unimod.org/xml/unimod.xml')[source]

Bases: object

A class for Unimod database of modifications. The list of all modifications can be retrieved via mods attribute. Methods for convenient searching are by_title and by_name. For more elaborate filtering, iterate manually over the list.

Note

See pyteomics.mass.unimod for a new alternative class with more features.

__init__(source='http://www.unimod.org/xml/unimod.xml')[source]

Create a database and fill it from XML file retrieved from source.

Parameters:

source (str or file, optional) – A file-like object or a URL to read from. Don’t forget the 'file://' prefix when pointing to local files.

by_id(i)[source]

Search modifications by record ID. If a modification is found, it is returned. Otherwise, KeyError is raised.

Parameters:

i (int or str) – The Unimod record ID.

Returns:

out – A single modification dict.

Return type:

dict

by_name(name, strict=True)[source]

Search modifications by name. If a single modification is found, it is returned. Otherwise, a list will be returned.

Parameters:
  • name (str) – The full name of the modification(s).

  • strict (bool, optional) – If False, the search will return all modifications whose full name contains title, otherwise equality is required. True by default.

Returns:

out – A single modification or a list of modifications.

Return type:

dict or list

by_title(title, strict=True)[source]

Search modifications by title. If a single modification is found, it is returned. Otherwise, a list will be returned.

Parameters:
  • title (str) – The modification title.

  • strict (bool, optional) – If False, the search will return all modifications whose title contains title, otherwise equality is required. True by default.

Returns:

out – A single modification or a list of modifications.

Return type:

dict or list

property mass_data

Get element mass data extracted from the database

property mods

Get the list of Unimod modifications

pyteomics.mass.mass.calculate_mass(*args, **kwargs)[source]

Calculates the monoisotopic mass of a polypeptide defined by a sequence string, parsed sequence, chemical formula or Composition object.

One or none of the following keyword arguments is required: formula, sequence, parsed_sequence, split_sequence or composition. All arguments given are used to create a Composition object, unless an existing one is passed as a keyword argument.

Note that if a sequence string is supplied and terminal groups are not explicitly shown, then the mass is calculated for a polypeptide with standard terminal groups (NH2- and -OH).

Warning

Be careful when supplying a list with a parsed sequence. It must be obtained with enabled show_unmodified_termini option.

Parameters:
  • formula (str, optional) – A string with a chemical formula.

  • sequence (str, optional) – A polypeptide sequence string in modX notation.

  • proforma (str, optional) – A polypeptide sequeence string in ProForma notation, or a pyteomics.proforma.ProForma object.

  • parsed_sequence (list of str, optional) – A polypeptide sequence parsed into a list of amino acids.

  • composition (Composition, optional) – A Composition object with the elemental composition of a substance.

  • aa_comp (dict, optional) – A dict with the elemental composition of the amino acids (the default value is std_aa_comp).

  • average (bool, optional) – If True then the average mass is calculated. Note that mass is not averaged for elements with specified isotopes. Default is False.

  • charge (int, optional) – If not 0 then m/z is calculated: the mass is increased by the corresponding number of proton masses and divided by charge.

  • charge_carrier (str or dict, optional) –

    Chemical group carrying the charge. Defaults to a proton, “H+”. If string, must be a chemical formula, as supported by the Composition formula argument, except it must end with a charge formatted as “[+-][N]”. If N is omitted, single charge is assumed. Examples of charge_carrier: “H+”, “NH3+” (here, 3 is part of the composition, and + is a single charge), “Fe+2” (“Fe” is the formula and “+2” is the charge).

    Note

    charge must be a multiple of charge_carrier charge.

    If dict, it is the atomic composition of the group. In this case, the charge can be passed separately as carrier_charge or it will be deduced from the number of protons in charge_carrier.

  • carrier_charge (int, optional) –

    Charge of the charge carrier group (if charge_carrier is specified as a composition dict).

    Note

    charge must be a multiple of charge_charge.

  • mass_data (dict, optional) – A dict with the masses of the chemical elements (the default value is nist_mass).

  • ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

  • ion_type (str, optional) – If specified, then the polypeptide is considered to be in the form of the corresponding ion. Do not forget to specify the charge state!

  • absolute (bool, optional) –

    If True (default), the m/z value returned will always be positive, even for negatively charged ions.

    Note

    absolute only applies when charge is negative. The mass can still be negative for negative compositions.

Returns:

mass

Return type:

float

pyteomics.mass.mass.fast_mass(sequence, ion_type=None, charge=None, **kwargs)[source]

Calculate monoisotopic mass of an ion using the fast algorithm. May be used only if amino acid residues are presented in one-letter code.

Parameters:
  • sequence (str) – A polypeptide sequence string.

  • ion_type (str, optional) – If specified, then the polypeptide is considered to be in a form of corresponding ion. Do not forget to specify the charge state!

  • charge (int, optional) – If not 0 then m/z is calculated: the mass is increased by the corresponding number of proton masses and divided by z.

  • mass_data (dict, optional) – A dict with the masses of chemical elements (the default value is nist_mass).

  • aa_mass (dict, optional) – A dict with the monoisotopic mass of amino acid residues (default is std_aa_mass);

  • ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

Returns:

mass – Monoisotopic mass or m/z of a peptide molecule/ion.

Return type:

float

pyteomics.mass.mass.fast_mass2(sequence, ion_type=None, charge=None, **kwargs)[source]

Calculate monoisotopic mass of an ion using the fast algorithm. modX notation is fully supported.

Parameters:
  • sequence (str) – A polypeptide sequence string.

  • ion_type (str, optional) – If specified, then the polypeptide is considered to be in a form of corresponding ion. Do not forget to specify the charge state!

  • charge (int, optional) – If not 0 then m/z is calculated: the mass is increased by the corresponding number of proton masses and divided by z.

  • mass_data (dict, optional) – A dict with the masses of chemical elements (the default value is nist_mass).

  • aa_mass (dict, optional) – A dict with the monoisotopic mass of amino acid residues (default is std_aa_mass);

  • ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

Returns:

mass – Monoisotopic mass or m/z of a peptide molecule/ion.

Return type:

float

pyteomics.mass.mass.isotopic_composition_abundance(*args, **kwargs)[source]

Calculate the relative abundance of a given isotopic composition of a molecule.

Parameters:
  • formula (str, optional) – A string with a chemical formula.

  • composition (Composition, optional) – A Composition object with the isotopic composition of a substance.

  • mass_data (dict, optional) – A dict with the masses of chemical elements (the default value is nist_mass).

Returns:

relative_abundance – The relative abundance of a given isotopic composition.

Return type:

float

pyteomics.mass.mass.isotopologues(*args, **kwargs)[source]

Iterate over possible isotopic states of a molecule. The molecule can be defined by formula, sequence, parsed sequence, or composition. The space of possible isotopic compositions is restrained by parameters elements_with_isotopes, isotope_threshold, overall_threshold.

Parameters:
  • formula (str, optional) – A string with a chemical formula.

  • sequence (str, optional) – A polypeptide sequence string in modX notation.

  • parsed_sequence (list of str, optional) – A polypeptide sequence parsed into a list of amino acids.

  • composition (Composition, optional) – A Composition object with the elemental composition of a substance.

  • report_abundance (bool, optional) – If True, the output will contain 2-tuples: (composition, abundance). Otherwise, only compositions are yielded. Default is False.

  • elements_with_isotopes (container of str, optional) – A set of elements to be considered in isotopic distribution (by default, every element has an isotopic distribution).

  • isotope_threshold (float, optional) – The threshold abundance of a specific isotope to be considered. Default is 5e-4.

  • overall_threshold (float, optional) – The threshold abundance of the calculateed isotopic composition. Default is 0.

  • aa_comp (dict, optional) – A dict with the elemental composition of the amino acids (the default value is std_aa_comp).

  • mass_data (dict, optional) – A dict with the masses of chemical elements (the default value is nist_mass).

Returns:

out – Iterator over possible isotopic compositions.

Return type:

iterator

pyteomics.mass.mass.most_probable_isotopic_composition(*args, **kwargs)[source]

Calculate the most probable isotopic composition of a peptide molecule/ion defined by a sequence string, parsed sequence, chemical formula or Composition object.

Note that if a sequence string without terminal groups is supplied then the isotopic composition is calculated for a polypeptide with standard terminal groups (H- and -OH).

For each element, only two most abundant isotopes are considered.

Parameters:
  • formula (str, optional) – A string with a chemical formula.

  • sequence (str, optional) – A polypeptide sequence string in modX notation.

  • parsed_sequence (list of str, optional) – A polypeptide sequence parsed into a list of amino acids.

  • composition (Composition, optional) – A Composition object with the elemental composition of a substance.

  • elements_with_isotopes (list of str) – A list of elements to be considered in isotopic distribution (by default, every element has a isotopic distribution).

  • aa_comp (dict, optional) – A dict with the elemental composition of the amino acids (the default value is std_aa_comp).

  • mass_data (dict, optional) – A dict with the masses of chemical elements (the default value is nist_mass).

  • ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

Returns:

out – A tuple with the most probable isotopic composition and its relative abundance.

Return type:

tuple (Composition, float)

pyteomics.mass.mass.nist_mass

A dict with the exact element masses downloaded from the NIST website: http://www.nist.gov/pml/data/comp.cfm . There are entries for each element containing the masses and relative abundances of several abundant isotopes and a separate entry for undefined isotope with zero key, mass of the most abundant isotope and 1.0 abundance.

pyteomics.mass.mass.std_aa_comp

A dictionary with elemental compositions of the twenty standard amino acid residues, selenocysteine, pyrrolysine, and standard H- and -OH terminal groups.

pyteomics.mass.mass.std_aa_mass

A dictionary with monoisotopic masses of the twenty standard amino acid residues, selenocysteine and pyrrolysine.

pyteomics.mass.mass.std_ion_comp

A dict with relative elemental compositions of the standard peptide fragment ions. An elemental composition of a fragment ion is calculated as a difference between the total elemental composition of an ion and the sum of elemental compositions of its constituting amino acid residues.

«  parser - operations on modX peptide sequences   ::   Contents   ::   unimod - interface to the Unimod database  »