Pyteomics documentation v4.7.4

Retention time prediction

«  Charge and pI   ::   Contents   ::   Data Access  »

Retention time prediction

Pyteomics has two modules for prediction of retention times (RTs) of peptides and proteins in liquid chromatography.

BioLCCC

The first module is pyteomics.biolccc. This module implements the BioLCCC model of liquid chromatography of polypeptides. pyteomics.biolccc is not distributed with the main package and has to be installed separately. pyteomics.biolccc can be downloaded from http://pypi.python.org/pypi/pyteomics.biolccc, and the project documentation is hosted at http://theorchromo.ru/docs.

Additive model of peptide chromatography

Another option for retention time prediction is the pyteomics.achrom module distributed with Pyteomics. It implements the additive model of polypeptide chromatography. Briefly, in the additive model each amino acid residue changes retention time by a fixed value, depending only on its type (e.g. an alanine residue add 2.0 mins to RT, while an arginine decreases it by 1.1 min). The module documentation contains the complete description of this model and the references. In this tutorial we will focus on the basic usage.

Retention time prediction

Retention time prediction with pyteomics.achrom is done by the pyteomics.achrom.calculate_RT() function:

>>> from pyteomics import achrom
>>> achrom.calculate_RT('PEPTIDE', achrom.RCs_guo_ph7_0)
7.8000000000000025

The first argument of the function is the sequence of a peptide in modX notation.

The second argument is the set parameters called ‘retention coefficients’ which describe chromatographic properties of individual amino acid residues in a polypeptide chain. pyteomics.achrom has a number of predefined sets of retention coefficients obtained from publications. The list, detailed descriptions and references related to these sets can be found in the module documentation.

Calibration

The main advantage of the additive model is that it gives more accurate predictions if adjusted to specific chromatographic setups and conditions. This adjustment, or ‘calibration’ requires a set of known peptide sequences and corresponding retention times (a ‘training set’) and returns a set of new retention coefficients. The following code illustrates the calibration procedure in Pyteomics.

>>> from pyteomics import achrom
>>> RCs = achrom.get_RCs(sequences, RTs)
>>> achrom.calculate_RT('PEPTIDE', RCs)

The first argument of pyteomics.achrom.get_RCs() should be a list of modX sequences, the second - a list of float-point retention times.

Like in pyteomics.parser.parse_sequence(), all non-standard amino modX acid labels used in the training set should be supplied to labels keyword argument of pyteomics.achrom.get_RCs() along with the standard ones:

>>> RCs = achrom.get_RCs(sequences, RTs, labels=achrom.std_labels + ['pS', 'pT'])

Advanced calibration

The standard additive model allows a couple of improvements. Firstly, an explicit dependency on the length of a peptide may be introduced by multiplying the retention time by \((1.0 + m * log(L))\), where L is the number of amino acid residues in the peptide and m is the length correction parameter, typically ~ -0.2.

The value of the length correction parameter is set at the calibration and stored along with the retention coefficients. By default, length correction is enabled in pyteomics.achrom.get_RCs() and the parameter equals -0.21. You can change the value of the length correction parameter by supplying the ‘lcp’ keyword argument, or you can disable length correction completely by setting lcp=0:

>>> RCs = achrom.get_RCs(sequences, RTs, lcp=-0.18) # A new value of the length correction parameter

>>> RCs = achrom.get_RCs(sequences, RTs, lcp=0) # Disable length correction.

Another considerable improvement over the standard additive model is to treat terminal amino acid residues as separate chemical entities. This behavior is disabled by default, but can be enabled by setting term_aa=True:

>>> RCs = achrom.get_RCs(sequences, RTs, term_aa=True)

This correction is implemented by addition of the ‘nterm’ and ‘cterm’ prefixes to the labels of terminal amino acid residues of the training peptides. In order for this correction to work, the training peptides should represent all possible variations of terminal amino acid residues.

«  Charge and pI   ::   Contents   ::   Data Access  »