mass - molecular masses and isotope distributions¶
Summary¶
This module defines general functions for mass and isotope abundance
calculations. For most of the functions, the user can define a given
substance in various formats, but all of them would be reduced to the
Composition
object describing its
chemical composition.
Classes¶
Composition
- a class storing chemical composition of a substance.
Unimod
- a class representing a Python interface to the Unimod database (seepyteomics.mass.unimod
for a much more powerful alternative).
Mass calculations¶
calculate_mass()
- a general routine for mass / m/z calculation. Can calculate mass for a polypeptide sequence, chemical formula or elemental composition. Supplied with an ion type and charge, the function would calculate m/z.
fast_mass()
- a less powerful but much faster function for polypeptide mass calculation.
fast_mass2()
- a version of fast_mass that supports modX notation.
Isotopic abundances¶
isotopic_composition_abundance()
- calculate the relative abundance of a given isotopic composition.
most_probable_isotopic_composition()
- finds the most abundant isotopic composition for a molecule defined by a polypeptide sequence, chemical formula or elemental composition.
isotopologues()
- iterate over possible isotopic conposition of a molecule, possibly filtered by abundance.
Data¶
nist_mass
- a dict with exact masses of the most abundant isotopes.
std_aa_comp
- a dict with the elemental compositions of the standard twenty amino acid residues, selenocysteine and pyrrolysine.
std_ion_comp
- a dict with the relative elemental compositions of the standard peptide fragment ions.
std_aa_mass
- a dict with the monoisotopic masses of the standard twenty amino acid residues, selenocysteine and pyrrolysine.
- Composition.__init__(*args, **kwargs)[source]¶
A Composition object stores a chemical composition of a substance. Basically it is a dict object, in which keys are the names of chemical elements and values contain integer numbers of corresponding atoms in a substance.
The main improvement over dict is that Composition objects allow addition and subtraction.
A Composition object can be initialized with one of the following arguments: formula, sequence, parsed_sequence or split_sequence.
If none of these are specified, the constructor will look at the first positional argument and try to build the object from it. Without positional arguments, a Composition will be constructed directly from keyword arguments.
If there’s an ambiguity, i.e. the argument is both a valid sequence and a formula (such as ‘HCN’), it will be treated as a sequence. You need to provide the ‘formula’ keyword to override this.
Warning
Be careful when supplying a list with a parsed sequence or a split sequence as a keyword argument. It must be obtained with enabled show_unmodified_termini option. When supplying it as a positional argument, the option doesn’t matter, because the positional argument is always converted to a sequence prior to any processing.
- Parameters:
formula (str, optional) – A string with a chemical formula.
sequence (str, optional) – A polypeptide sequence string in modX notation.
parsed_sequence (list of str, optional) – A polypeptide sequence parsed into a list of amino acids.
split_sequence (list of tuples of str, optional) – A polypeptyde sequence parsed into a list of tuples (as returned be
pyteomics.parser.parse()
withsplit=True
).aa_comp (dict, optional) – A dict with the elemental composition of the amino acids (the default value is std_aa_comp).
ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is
std_ion_comp
).ion_type (str, optional) – If specified, then the polypeptide is considered to be in the form of the corresponding ion.
- Composition.mass(**kwargs)[source]¶
Calculate the mass or m/z of a
Composition
.- Parameters:
average (bool, optional) – If
True
then the average mass is calculated. Note that mass is not averaged for elements with specified isotopes. Default isFalse
.charge (int, optional) – If not 0 then m/z is calculated. See also: charge_carrier.
charge_carrier (str or dict, optional) –
Chemical group carrying the charge. Defaults to a proton, “H+”. If string, must be a chemical formula, as supported by the
Composition
formula argument, except it must end with a charge formatted as “[+-][N]”. If N is omitted, single charge is assumed. Examples of charge_carrier: “H+”, “NH3+” (here, 3 is part of the composition, and + is a single charge), “Fe+2” (“Fe” is the formula and “+2” is the charge). .. note :: charge must be a multiple of charge_carrier charge.If dict, it is the atomic composition of the group. In this case, the charge can be passed separately as carrier_charge or it will be deduced from the number of protons in charge_carrier.
carrier_charge (int, optional) –
Charge of the charge carrier group (if charge_carrier is specified as a composition dict).
Note
charge must be a multiple of charge_charge.
mass_data (dict, optional) – A dict with the masses of the chemical elements (the default value is
nist_mass
).ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is
std_ion_comp
).ion_type (str, optional) – If specified, then the polypeptide is considered to be in the form of the corresponding ion. Do not forget to specify the charge state!
absolute (bool, optional) –
If
True
(default), the m/z value returned will always be positive, even for negatively charged ions.Note
absolute only applies when charge is negative. The mass can still be negative for negative compositions.
- Returns:
mass
- Return type:
- class pyteomics.mass.mass.Unimod(source='http://www.unimod.org/xml/unimod.xml')[source]¶
Bases:
object
A class for Unimod database of modifications. The list of all modifications can be retrieved via mods attribute. Methods for convenient searching are by_title and by_name. For more elaborate filtering, iterate manually over the list.
Note
See
pyteomics.mass.unimod
for a new alternative class with more features.- __init__(source='http://www.unimod.org/xml/unimod.xml')[source]¶
Create a database and fill it from XML file retrieved from source.
- Parameters:
source (str or file, optional) – A file-like object or a URL to read from. Don’t forget the
'file://'
prefix when pointing to local files.
- by_id(i)[source]¶
Search modifications by record ID. If a modification is found, it is returned. Otherwise,
KeyError
is raised.
- by_name(name, strict=True)[source]¶
Search modifications by name. If a single modification is found, it is returned. Otherwise, a list will be returned.
- Parameters:
- Returns:
out – A single modification or a list of modifications.
- Return type:
- by_title(title, strict=True)[source]¶
Search modifications by title. If a single modification is found, it is returned. Otherwise, a list will be returned.
- property mass_data¶
Get element mass data extracted from the database
- property mods¶
Get the list of Unimod modifications
- pyteomics.mass.mass.calculate_mass(*args, **kwargs)[source]¶
Calculates the monoisotopic mass of a polypeptide defined by a sequence string, parsed sequence, chemical formula or Composition object.
One or none of the following keyword arguments is required: formula, sequence, parsed_sequence, split_sequence or composition. All arguments given are used to create a
Composition
object, unless an existing one is passed as a keyword argument.Note that if a sequence string is supplied and terminal groups are not explicitly shown, then the mass is calculated for a polypeptide with standard terminal groups (NH2- and -OH).
Warning
Be careful when supplying a list with a parsed sequence. It must be obtained with enabled show_unmodified_termini option.
- Parameters:
formula (str, optional) – A string with a chemical formula.
sequence (str, optional) – A polypeptide sequence string in modX notation.
proforma (str, optional) – A polypeptide sequeence string in ProForma notation, or a
pyteomics.proforma.ProForma
object.parsed_sequence (list of str, optional) – A polypeptide sequence parsed into a list of amino acids.
composition (Composition, optional) – A Composition object with the elemental composition of a substance.
aa_comp (dict, optional) – A dict with the elemental composition of the amino acids (the default value is std_aa_comp).
average (bool, optional) – If
True
then the average mass is calculated. Note that mass is not averaged for elements with specified isotopes. Default isFalse
.charge (int, optional) – If not 0 then m/z is calculated: the mass is increased by the corresponding number of proton masses and divided by charge.
charge_carrier (str or dict, optional) –
Chemical group carrying the charge. Defaults to a proton, “H+”. If string, must be a chemical formula, as supported by the
Composition
formula argument, except it must end with a charge formatted as “[+-][N]”. If N is omitted, single charge is assumed. Examples of charge_carrier: “H+”, “NH3+” (here, 3 is part of the composition, and + is a single charge), “Fe+2” (“Fe” is the formula and “+2” is the charge).Note
charge must be a multiple of charge_carrier charge.
If dict, it is the atomic composition of the group. In this case, the charge can be passed separately as carrier_charge or it will be deduced from the number of protons in charge_carrier.
carrier_charge (int, optional) –
Charge of the charge carrier group (if charge_carrier is specified as a composition dict).
Note
charge must be a multiple of charge_charge.
mass_data (dict, optional) – A dict with the masses of the chemical elements (the default value is
nist_mass
).ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is
std_ion_comp
).ion_type (str, optional) – If specified, then the polypeptide is considered to be in the form of the corresponding ion. Do not forget to specify the charge state!
absolute (bool, optional) –
If
True
(default), the m/z value returned will always be positive, even for negatively charged ions.Note
absolute only applies when charge is negative. The mass can still be negative for negative compositions.
- Returns:
mass
- Return type:
- pyteomics.mass.mass.fast_mass(sequence, ion_type=None, charge=None, **kwargs)[source]¶
Calculate monoisotopic mass of an ion using the fast algorithm. May be used only if amino acid residues are presented in one-letter code.
- Parameters:
sequence (str) – A polypeptide sequence string.
ion_type (str, optional) – If specified, then the polypeptide is considered to be in a form of corresponding ion. Do not forget to specify the charge state!
charge (int, optional) – If not 0 then m/z is calculated: the mass is increased by the corresponding number of proton masses and divided by z.
mass_data (dict, optional) – A dict with the masses of chemical elements (the default value is
nist_mass
).aa_mass (dict, optional) – A dict with the monoisotopic mass of amino acid residues (default is std_aa_mass);
ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is
std_ion_comp
).
- Returns:
mass – Monoisotopic mass or m/z of a peptide molecule/ion.
- Return type:
- pyteomics.mass.mass.fast_mass2(sequence, ion_type=None, charge=None, **kwargs)[source]¶
Calculate monoisotopic mass of an ion using the fast algorithm. modX notation is fully supported.
- Parameters:
sequence (str) – A polypeptide sequence string.
ion_type (str, optional) – If specified, then the polypeptide is considered to be in a form of corresponding ion. Do not forget to specify the charge state!
charge (int, optional) – If not 0 then m/z is calculated: the mass is increased by the corresponding number of proton masses and divided by z.
mass_data (dict, optional) – A dict with the masses of chemical elements (the default value is
nist_mass
).aa_mass (dict, optional) – A dict with the monoisotopic mass of amino acid residues (default is std_aa_mass).
ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is
std_ion_comp
).
- Returns:
mass – Monoisotopic mass or m/z of a peptide molecule/ion.
- Return type:
- pyteomics.mass.mass.isotopic_composition_abundance(*args, **kwargs)[source]¶
Calculate the relative abundance of a given isotopic composition of a molecule.
- Parameters:
- Returns:
relative_abundance – The relative abundance of a given isotopic composition.
- Return type:
- pyteomics.mass.mass.isotopologues(*args, **kwargs)[source]¶
Iterate over possible isotopic states of a molecule. The molecule can be defined by formula, sequence, parsed sequence, or composition. The space of possible isotopic compositions is restrained by parameters
elements_with_isotopes
,isotope_threshold
,overall_threshold
.- Parameters:
formula (str, optional) – A string with a chemical formula.
sequence (str, optional) – A polypeptide sequence string in modX notation.
parsed_sequence (list of str, optional) – A polypeptide sequence parsed into a list of amino acids.
composition (
Composition
, optional) – AComposition
object with the elemental composition of a substance.report_abundance (bool, optional) – If
True
, the output will contain 2-tuples: (composition, abundance). Otherwise, only compositions are yielded. Default isFalse
.elements_with_isotopes (container of str, optional) – A set of elements to be considered in isotopic distribution (by default, every element has an isotopic distribution).
isotope_threshold (float, optional) – The threshold abundance of a specific isotope to be considered. Default is
5e-4
.overall_threshold (float, optional) – The threshold abundance of the calculateed isotopic composition. Default is
0
.aa_comp (dict, optional) – A dict with the elemental composition of the amino acids (the default value is
std_aa_comp
).mass_data (dict, optional) – A dict with the masses of chemical elements (the default value is
nist_mass
).
- Returns:
out – Iterator over possible isotopic compositions.
- Return type:
iterator
- pyteomics.mass.mass.most_probable_isotopic_composition(*args, **kwargs)[source]¶
Calculate the most probable isotopic composition of a peptide molecule/ion defined by a sequence string, parsed sequence, chemical formula or
Composition
object.Note that if a sequence string without terminal groups is supplied then the isotopic composition is calculated for a polypeptide with standard terminal groups (H- and -OH).
For each element, only two most abundant isotopes are considered.
- Parameters:
formula (str, optional) – A string with a chemical formula.
sequence (str, optional) – A polypeptide sequence string in modX notation.
parsed_sequence (list of str, optional) – A polypeptide sequence parsed into a list of amino acids.
composition (
Composition
, optional) – AComposition
object with the elemental composition of a substance.elements_with_isotopes (list of str) – A list of elements to be considered in isotopic distribution (by default, every element has a isotopic distribution).
aa_comp (dict, optional) – A dict with the elemental composition of the amino acids (the default value is
std_aa_comp
).mass_data (dict, optional) – A dict with the masses of chemical elements (the default value is
nist_mass
).ion_comp (dict, optional) – A dict with the relative elemental compositions of peptide ion fragments (default is
std_ion_comp
).
- Returns:
out – A tuple with the most probable isotopic composition and its relative abundance.
- Return type:
- pyteomics.mass.mass.nist_mass¶
A dict with the exact element masses downloaded from the NIST website: http://www.nist.gov/pml/data/comp.cfm . There are entries for each element containing the masses and relative abundances of several abundant isotopes and a separate entry for undefined isotope with zero key, mass of the most abundant isotope and 1.0 abundance.
- pyteomics.mass.mass.std_aa_comp¶
A dictionary with elemental compositions of the twenty standard amino acid residues, selenocysteine, pyrrolysine, and standard H- and -OH terminal groups.
- pyteomics.mass.mass.std_aa_mass¶
A dictionary with monoisotopic masses of the twenty standard amino acid residues, selenocysteine and pyrrolysine.
- pyteomics.mass.mass.std_ion_comp¶
A dict with relative elemental compositions of the standard peptide fragment ions. An elemental composition of a fragment ion is calculated as a difference between the total elemental composition of an ion and the sum of elemental compositions of its constituting amino acid residues.