Pyteomics documentation v4.7.1

achrom - additive model of polypeptide chromatography

«  unimod - interface to the Unimod database   ::   Contents   ::   electrochem - electrochemical properties of polypeptides  »

achrom - additive model of polypeptide chromatography

Summary

The additive model of polypeptide chromatography, or achrom, is the most basic model for peptide retention time prediction. The main equation behind achrom has the following form:

\[RT = (1 + m\,ln N) \sum_{i=1}^{i=N}{RC_i n_i} + RT_0\]

Here, \(RC_i\) is the retention coefficient of the amino acid residues of the i-th type, \(n_i\) corresponds to the number of amino acid residues of type \(i\) in the peptide sequence, N is the total number of different types of amino acid residues present, and \(RT_0\) is a constant retention time shift.

In order to use achrom, one needs to find the retention coeffcients, using experimentally determined retention times for a training set of peptide retention times, i.e. to calibrate the model.

Calibration

get_RCs() - find a set of retention coefficients using a given set of peptides with known retention times and a fixed value of length correction parameter.

get_RCs_vary_lcp() - find the best length correction parameter and a set of retention coefficients for a given peptide sample.

Retention time calculation

calculate_RT() - calculate the retention time of a peptide using a given set of retention coefficients.

Data

RCs_guo_ph2_0 - a set of retention coefficients (RCs) from [2]. Conditions: Synchropak RP-P C18 column (250 x 4.1 mm I.D.), gradient (A = 0.1% aq. TFA, pH 2.0; B = 0.1% TFA in acetonitrile) at 1% B/min, flow rate 1 ml/min, 26 centigrades.

RCs_guo_ph7_0 - a set of retention coefficients (RCs) from [2]. Conditions: Synchropak RP-P C18 column (250 x 4.1 mm I.D.), gradient (A = aq. 10 mM (NH4)2HPO4 - 0.1 M NaClO4, pH 7.0; B = 0.1 M NaClO4 in 60% aq. acetonitrile) at 1.67% B/min, flow rate 1 ml/min, 26 centigrades.

RCs_meek_ph2_1 - a set of RCs from [1]. Conditions: Bio-Rad “ODS” column, gradient (A = 0.1 M NaClO4, 0.1% phosphoric acid in water; B = 0.1 M NaClO4, 0.1% phosphoric acid in 60% aq. acetonitrile) at 1.25% B/min, room temperature.

RCs_meek_ph7_4 - a set of RCs from [1]. Conditions: Bio-Rad “ODS” column, gradient (A = 0.1 M NaClO4, 5 mM phosphate buffer in water; B = 0.1 M NaClO4, 5 mM phosphate buffer in 60% aq. acetonitrile) at 1.25% B/min, room temperature.

RCs_browne_tfa - a set of RCs found in [7]. Conditions: Waters mjuBondapak C18 column, gradient (A = 0.1% aq. TFA, B = 0.1% TFA in acetonitrile) at 0.33% B/min, flow rate 1.5 ml/min.

RCs_browne_hfba - a set of RCs found in [7]. Conditions: Waters mjuBondapak C18 column, gradient (A = 0.13% aq. HFBA, B = 0.13% HFBA in acetonitrile) at 0.33% B/min, flow rate 1.5 ml/min.

RCs_palmblad - a set of RCs from [8]. Conditions: a fused silica column (80-100 x 0.200 mm I.D.) packed in-house with C18 ODS-AQ; solvent A = 0.5% aq. HAc, B = 0.5% HAc in acetonitrile.

RCs_yoshida - a set of RCs for normal phase chromatography from [9]. Conditions: TSK gel Amide-80 column (250 x 4.6 mm I.D.), gradient (A = 0.1% TFA in ACN-water (90:10); B = 0.1% TFA in ACN-water (55:45)) at 0.6% water/min, flow rate 1.0 ml/min, 40 centigrades.

RCs_yoshida_lc - a set of length-corrected RCs for normal phase chromatography. The set was calculated in [10] for the data from [9]. Conditions: TSK gel Amide-80 column (250 x 4.6 mm I.D.), gradient (A = 0.1% TFA in ACN-water (90:10); B = 0.1% TFA in ACN-water (55:45)) at 0.6% water/min, flow rate 1.0 ml/min, 40 centigrades.

RCs_zubarev - a set of length-corrected RCs calculated on a dataset used in [11]. Conditions: Reprosil-Pur C18-AQ column (150 x 0.075 mm I.D.), gradient (A = 0.5% AA in water; B = 0.5% AA in ACN-water (90:10)) at 0.5% water/min, flow rate 200.0 nl/min, room temperature.

RCs_gilar_atlantis_ph3_0 - a set of retention coefficients obtained in [12]. Conditions: Atlantis HILIC silica column, (150 x 2.1 mm I.D.), 3 um, 100 A, gradient (A = water, B = ACN, C = 200 mM ammonium formate): 0 min, 5% A, 90% B, 5% C; 62.5 min, 55% A, 40% B, 5% C at 0.2 ml/min, temperature 40 C, pH 3.0

RCs_gilar_atlantis_ph4_5 - a set of retention coefficients obtained in [12]. Conditions: Atlantis HILIC silica column, (150 x 2.1 mm I.D.), 3 um, 100 A, gradient (A = water, B = ACN, C = 200 mM ammonium formate): 0 min, 5% A, 90% B, 5% C; 62.5 min, 55% A, 40% B, 5% C at 0.2 ml/min, temperature 40 C, pH 4.5

RCs_gilar_atlantis_ph10_0 - a set of retention coefficients obtained in [12]. Conditions: Atlantis HILIC silica column, (150 x 2.1 mm I.D.), 3 um, 100 A, gradient (A = water, B = ACN, C = 200 mM ammonium formate): 0 min, 5% A, 90% B, 5% C; 62.5 min, 55% A, 40% B, 5% C at 0.2 ml/min, temperature 40 C, pH 10.0

RCs_gilar_beh - a set of retention coefficients obtained in [12]. Conditions: ACQUITY UPLC BEH HILIC column (150 x 2.1 mm I.D.), 1.7 um, 130 A, Mobile phase A: 10 mM ammonium formate buffer, pH 4.5 prepared by titrating 10 mM solution of FA with ammonium hydroxide. Mobile phase B: 90% ACN, 10% mobile phase A (v:v). Gradient: 90-60% B in 50 min.

RCs_gilar_beh_amide - a set of retention coefficients obtained in [12]. Conditions: ACQUITY UPLC BEH glycan column (150 x 2.1 mm I.D.), 1.7 um, 130 A, Mobile phase A: 10 mM ammonium formate buffer, pH 4.5 prepared by titrating 10 mM solution of FA with ammonium hydroxide. Mobile phase B: 90% ACN, 10% mobile phase A (v:v). Gradient: 90-60% B in 50 min.

RCs_gilar_rp - a set of retention coefficients obtained in [12]. Conditions: ACQUITY UPLC BEH C18 column (100 mm x 2.1 mm I.D.), 1.7 um, 130 A. Mobile phase A: 0.02% TFA in water, mobile phase B: 0.018% TFA in ACN. Gradient: 0 to 50% B in 50 min, flow rate 0.2 ml/min, temperature 40 C., pH 2.6.

RCs_krokhin_100A_fa - a set of retention coefficients obtained in [13]. Conditions: 300 um x 150mm PepMap100 (Dionex, 0.1% FA), packed with 5-um Luna C18(2) (Phenomenex, Torrance, CA), pH=2.0. Both eluents A (2% ACN in water) and B (98% ACN) contained 0.1% FA as ion-pairing modifier. 0.33% ACN/min linear gradient (0-30% B).

RCs_krokhin_100A_tfa - a set of retention coefficients obtained in [13]. Conditions: 300 um x 150mm PepMap100 (Dionex, 0.1% TFA), packed with 5-um Luna C18(2) (Phenomenex, Torrance, CA), pH=2.0. Both eluents A (2% ACN in water) and B (98% ACN) contained 0.1% TFA as ion-pairing modifier. 0.33% ACN/min linear gradient (0-30% B).

Theory

The additive model of polypeptide chromatography, or the model of retention coefficients was the earliest attempt to describe the dependence of retention time of a polypeptide in liquid chromatography on its sequence [1], [2]. In this model, each amino acid is assigned a number, or a retention coefficient (RC) describing its retention properties. The retention time (RT) during a gradient elution is then calculated as:

\[RT = \sum_{i=1}^{i=N}{RC_i \cdot n_i} + RT_0,\]

which is the sum of retention coefficients of all amino acid residues in a polypeptide. This equation can also be expressed in terms of linear algebra:

\[RT = \bar{aa} \cdot \bar{RC} + RT_0,\]

where \(\bar{aa}\) is a vector of amino acid composition, i.e. \(\bar{aa}_i\) is the number of amino acid residues of i-th type in a polypeptide; \(\bar{RC}\) is a vector of respective retention coefficients.

In this formulation, it is clear that additive model gives the same results for any two peptides with different sequences but the same amino acid composition. In other words, additive model is not sequence-specific.

The additive model has two advantages over all other models of chromatography - it is easy to understand and use. The rule behind the additive model is as simple as it could be: each amino acid residue shifts retention time by a fixed value, depending only on its type. This rule allows geometrical interpretation. Each peptide may be represented by a point in 21-dimensional space, with first 20 coordinates equal to the amounts of corresponding amino acid residues in the peptide and 21-st coordinate equal to RT. The additive model assumes that a line may be drawn through these points. Of course, this assumption is valid only partially, and most points would not lie on the line. But the line would describe the main trend and could be used to estimate retention time for peptides with known amino acid composition.

This best fit line is described by retention coefficients and \(RT_0\). The procedure of finding these coefficients is called calibration. There is an analytical solution to calibration of linear models, which makes them especially useful in real applications.

Several attempts were made in order to improve the accuracy of prediction by the additive model (for a review of the field we suggest to read [3] and [4]). The two implemented in this module are the logarithmic length correction term described in [5] and additional sets of retention coefficients for terminal amino acid residues [6].

Logarithmic length correction

This enhancement was firstly described in [5]. Briefly, it was found that the following equation better describes the dependence of RT on the peptide sequence:

\[RT = \sum_{i=1}^{i=N}{RC_i} + m\,ln N \sum_{i=1}^{i=N}{RC_i} + RT_0\]

We would call the second term \(m\,ln N \sum_{i=1}^{i=N}{RC_i}\) the length correction term and m - the length correction parameter. The simplified and vectorized form of this equation would be:

\[RT = (1 + m\,ln N) \, \bar{RC} \cdot \bar{aa} + RT_0\]

This equation may be reduced to a linear form and solved by the standard methods.

Terminal retention coefficients

Another significant improvement may be obtained through introduction of separate sets of retention coefficients for terminal amino acid residues [6].

References

Dependencies

This module requires numpy and, optionally, scikit-learn (for MAE regression).


pyteomics.achrom.RCs_browne_hfba

A set of retention coefficients determined in Browne, C. A.; Bennett, H. P. J.; Solomon, S. The isolation of peptides by high-performance liquid chromatography using predicted elution positions. Analytical Biochemistry, 1982, 124 (1), 201-208.

Conditions: Waters mjuBondapak C18 column, gradient (A = 0.13% aq. HFBA, B = 0.13% HFBA in acetonitrile) at 0.33% B/min, flow rate 1.5 ml/min.

pyteomics.achrom.RCs_browne_tfa

A set of retention coefficients determined in Browne, C. A.; Bennett, H. P. J.; Solomon, S. The isolation of peptides by high-performance liquid chromatography using predicted elution positions. Analytical Biochemistry, 1982, 124 (1), 201-208.

Conditions: Waters mjuBondapak C18 column, gradient (A = 0.1% aq. TFA, B = 0.1% TFA in acetonitrile) at 0.33% B/min, flow rate 1.5 ml/min.

pyteomics.achrom.RCs_gilar_atlantis_ph10_0

A set of retention coefficients for normal phase chromatography obtained in Gilar, M., & Jaworski, A. (2011). Retention behavior of peptides in hydrophilic-interaction chromatography. Journal of chromatography A, 1218(49), 8890-6.

Note

Cysteine is Carbamidomethylated.

Conditions: Atlantis HILIC silica column (150 x 2.1 mm I.D.), 3 um, 100 A, gradient (A = water, B = ACN, C = 200 mM ammonium formate): 0 min, 5% A, 90% B, 5% C; 62.5 min, 55% A, 40% B, 5% C at 0.2 ml/min, temperature 40 C, pH 10.0

pyteomics.achrom.RCs_gilar_atlantis_ph3_0

A set of retention coefficients for normal phase chromatography obtained in Gilar, M., & Jaworski, A. (2011). Retention behavior of peptides in hydrophilic-interaction chromatography. Journal of chromatography A, 1218(49), 8890-6.

Note

Cysteine is Carbamidomethylated.

Conditions: Atlantis HILIC silica column (150 x 2.1 mm I.D.), 3 um, 100 A, gradient (A = water, B = ACN, C = 200 mM ammonium formate): 0 min, 5% A, 90% B, 5% C; 62.5 min, 55% A, 40% B, 5% C at 0.2 ml/min, temperature 40 C, pH 3.0

pyteomics.achrom.RCs_gilar_atlantis_ph4_5

A set of retention coefficients for normal phase chromatography obtained in Gilar, M., & Jaworski, A. (2011). Retention behavior of peptides in hydrophilic-interaction chromatography. Journal of chromatography A, 1218(49), 8890-6.

Note

Cysteine is Carbamidomethylated.

Conditions: Atlantis HILIC silica column (150 x 2.1 mm I.D.), 3 um, 100 A, gradient (A = water, B = ACN, C = 200 mM ammonium formate): 0 min, 5% A, 90% B, 5% C; 62.5 min, 55% A, 40% B, 5% C at 0.2 ml/min, temperature 40 C, pH 4.5

pyteomics.achrom.RCs_gilar_beh

A set of retention coefficients for normal phase chromatography obtained in Gilar, M., & Jaworski, A. (2011). Retention behavior of peptides in hydrophilic-interaction chromatography. Journal of chromatography A, 1218(49), 8890-6.

Note

Cysteine is Carbamidomethylated.

Conditions: ACQUITY UPLC BEH HILIC column (150 x 2.1 mm I.D.), 1.7 um, 130 A, Mobile phase A: 10 mM ammonium formate buffer, pH 4.5 prepared by titrating 10 mM solution of FA with ammonium hydroxide. Mobile phase B: 90% ACN, 10% mobile phase A (v:v). Gradient: 90-60% B in 50 min.

pyteomics.achrom.RCs_gilar_beh_amide

A set of retention coefficients for normal phase chromatography obtained in Gilar, M., & Jaworski, A. (2011). Retention behavior of peptides in hydrophilic-interaction chromatography. Journal of chromatography A, 1218(49), 8890-6.

Note

Cysteine is Carbamidomethylated.

Conditions: ACQUITY UPLC BEH glycan column (150 x 2.1 mm I.D.), 1.7 um, 130 A, Mobile phase A: 10 mM ammonium formate buffer, pH 4.5 prepared by titrating 10 mM solution of FA with ammonium hydroxide. Mobile phase B: 90% ACN, 10% mobile phase A (v:v). Gradient: 90-60% B in 50 min.

pyteomics.achrom.RCs_gilar_rp

A set of retention coefficients for normal phase chromatography obtained in Gilar, M., & Jaworski, A. (2011). Retention behavior of peptides in hydrophilic-interaction chromatography. Journal of chromatography A, 1218(49), 8890-6.

Note

Cysteine is Carbamidomethylated.

Conditions: ACQUITY UPLC BEH C18 column (100 mm x 2.1 mm I.D.), 1.7 um, 130 A. Mobile phase A: 0.02% TFA in water, mobile phase B: 0.018% TFA in ACN. Gradient: 0 to 50% B in 50 min, flow rate 0.2 ml/min, temperature 40 C., pH 2.6.

pyteomics.achrom.RCs_guo_ph2_0

A set of retention coefficients from Guo, D.; Mant, C. T.; Taneja, A. K.; Parker, J. M. R.; Hodges, R. S. Prediction of peptide retention times in reversed-phase high-performance liquid chromatography I. Determination of retention coefficients of amino acid residues of model synthetic peptides. Journal of Chromatography A, 1986, 359, 499-518.

Conditions: Synchropak RP-P C18 column (250 x 4.1 mm I.D.), gradient (A = 0.1% aq. TFA, pH 2.0; B = 0.1% TFA in acetonitrile) at 1% B/min, flow rate 1 ml/min, 26 centigrades.

pyteomics.achrom.RCs_guo_ph7_0

A set of retention coefficients from Guo, D.; Mant, C. T.; Taneja, A. K.; Parker, J. M. R.; Hodges, R. S. Prediction of peptide retention times in reversed-phase high-performance liquid chromatography I. Determination of retention coefficients of amino acid residues of model synthetic peptides. Journal of Chromatography A, 1986, 359, 499-518.

Conditions: Synchropak RP-P C18 column (250 x 4.1 mm I.D.), gradient (A = aq. 10 mM (NH4)2HPO4 - 0.1 M NaClO4, pH 7.0; B = 0.1 M NaClO4 in 60% aq. acetonitrile) at 1.67% B/min, flow rate 1 ml/min, 26 centigrades.

pyteomics.achrom.RCs_krokhin_100A_fa

A set of retention coefficients from R.C. Dwivedi, V. Spicer, M. Harder, M. Antonovici, W. Ens, K.G. Standing, J.A. Wilkins, and O.V. Krokhin; Analytical Chemistry 2008 80 (18), 7036-7042. Practical Implementation of 2D HPLC Scheme with Accurate Peptide Retention Prediction in Both Dimensions for High-Throughput Bottom-Up Proteomics.

Note

Cysteine is Carbamidomethylated.

Conditions: 300 um x 150mm PepMap100 (Dionex, 0.1% FA), packed with 5-um Luna C18(2) (Phenomenex, Torrance, CA), pore size 100A, pH=2.0. Both eluents A (2% ACN in water) and B (98% ACN) contained 0.1% FA as ion-pairing modifier. 0.33% ACN/min linear gradient (0-30% B).

pyteomics.achrom.RCs_krokhin_100A_tfa

A set of retention coefficients from R.C. Dwivedi, V. Spicer, M. Harder, M. Antonovici, W. Ens, K.G. Standing, J.A. Wilkins, and O.V. Krokhin; Analytical Chemistry 2008 80 (18), 7036-7042. Practical Implementation of 2D HPLC Scheme with Accurate Peptide Retention Prediction in Both Dimensions for High-Throughput Bottom-Up Proteomics.

Note

Cysteine is Carbamidomethylated.

Conditions: 300 um x 150mm PepMap100 (Dionex, 0.1% TFA), packed with 5-um Luna C18(2) (Phenomenex, Torrance, CA), pore size 100 A, pH=2.0. Both eluents A (2% ACN in water) and B (98% ACN) contained 0.1% TFA as ion-pairing modifier. 0.33% ACN/min linear gradient (0-30% B).

pyteomics.achrom.RCs_meek_ph2_1

A set of retention coefficients determined in Meek, J. L. Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. PNAS, 1980, 77 (3), 1632-1636.

Note

C stands for Cystine.

Conditions: Bio-Rad “ODS” column, gradient (A = 0.1 M NaClO4, 0.1% phosphoric acid in water; B = 0.1 M NaClO4, 0.1% phosphoric acid in 60% aq. acetonitrile) at 1.25% B/min, room temperature.

pyteomics.achrom.RCs_meek_ph7_4

A set of retention coefficients determined in Meek, J. L. Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. PNAS, 1980, 77 (3), 1632-1636.

Note

C stands for Cystine.

Conditions: Bio-Rad “ODS” column, gradient (A = 0.1 M NaClO4, 5 mM phosphate buffer in water; B = 0.1 M NaClO4, 5 mM phosphate buffer in 60% aq. acetonitrile) at 1.25% B/min, room temperature.

pyteomics.achrom.RCs_palmblad

A set of retention coefficients determined in Palmblad, M.; Ramstrom, M.; Markides, K. E.; Hakansson, P.; Bergquist, J. Prediction of Chromatographic Retention and Protein Identification in Liquid Chromatography/Mass Spectrometry. Analytical Chemistry, 2002, 74 (22), 5826-5830.

Conditions: a fused silica column (80-100 x 0.200 mm I.D.) packed in-house with C18 ODS-AQ; solvent A = 0.5% aq. HAc, B = 0.5% HAc in acetonitrile.

pyteomics.achrom.RCs_yoshida

A set of retention coefficients determined in Yoshida, T. Calculation of peptide retention coefficients in normal-phase liquid chromatography. Journal of Chromatography A, 1998, 808 (1-2), 105-112.

Note

Cysteine is Carboxymethylated.

Conditions: TSK gel Amide-80 column (250 x 4.6 mm I.D.), gradient (A = 0.1% TFA in ACN-water (90:10); B = 0.1% TFA in ACN-water (55:45)) at 0.6% water/min, flow rate 1.0 ml/min, 40 centigrades.

pyteomics.achrom.RCs_yoshida_lc

A set of retention coefficients from the length-corrected model of normal-phase peptide chromatography. The dataset comes from Yoshida, T. Calculation of peptide retention coefficients in normal-phase liquid chromatography. Journal of Chromatography A, 1998, 808 (1-2), 105-112. The RCs were calculated in Moskovets, E.; Goloborodko A. A.; Gorshkov A. V.; Gorshkov M.V. Limitation of predictive 2-D liquid chromatography in reducing the database search space in shotgun proteomics: In silico studies. Journal of Separation Science, 2012, 35 (14), 1771-1778.

Note

Cysteine is Carboxymethylated.

Conditions: TSK gel Amide-80 column (250 x 4.6 mm I.D.), gradient (A = 0.1% TFA in ACN-water (90:10); B = 0.1% TFA in ACN-water (55:45)) at 0.6% water/min, flow rate 1.0 ml/min, 40 centigrades.

pyteomics.achrom.RCs_zubarev

A set of retention coefficients from the length-corrected model of reversed-phase peptide chromatography. The dataset was taken from Goloborodko A. A.; Mayerhofer C.; Zubarev A. R.; Tarasova I. A.; Gorshkov A. V.; Zubarev, R. A.; Gorshkov, M. V. Empirical approach to false discovery rate estimation in shotgun proteomics. Rapid communications in mass spectrometry, 2010, 24(4), 454-62.

Note

Cysteine is Carbamidomethylated.

Conditions: Reprosil-Pur C18-AQ column (150 x 0.075 mm I.D.), gradient (A = 0.5% AA in water; B = 0.5% AA in ACN-water (90:10)) at 0.5% water/min, flow rate 200.0 nl/min, room temperature.

pyteomics.achrom.calculate_RT(peptide, RC_dict, raise_no_mod=True)[source]

Calculate the retention time of a peptide using a given set of retention coefficients.

Parameters:
  • peptide (str or dict) – A peptide sequence or amino acid composition.

  • RC_dict (dict) – A set of retention coefficients, length correction parameter and a fixed retention time shift. Keys are: ‘aa’, ‘lcp’ and ‘const’.

  • raise_no_mod (bool, optional) – If True then an exception is raised when a modified amino acid from peptides is not found in RC_dict. If False, then the retention coefficient for the non-modified amino acid residue is used instead. True by default.

Returns:

RT – Calculated retention time.

Return type:

float

Examples

>>> RT = calculate_RT('AA', {'aa': {'A': 1.1}, 'lcp':0.0, 'const': 0.1})
>>> abs(RT - 2.3) < 1e-6      # Float comparison
True
>>> RT = calculate_RT('AAA', {'aa': {'ntermA': 1.0, 'A': 1.1, 'ctermA': 1.2},        'lcp': 0.0, 'const':0.1})
>>> abs(RT - 3.4) < 1e-6      # Float comparison
True
>>> RT = calculate_RT({'A': 3}, {'aa': {'ntermA': 1.0, 'A': 1.1, 'ctermA': 1.2},        'lcp': 0.0, 'const':0.1})
>>> abs(RT - 3.4) < 1e-6      # Float comparison
True
pyteomics.achrom.get_RCs(sequences, RTs, lcp=-0.21, term_aa=False, metric='mse', **kwargs)[source]

Calculate the retention coefficients of amino acids using retention times of a peptide sample and a fixed value of length correction parameter.

Parameters:
  • sequences (list of str) – List of peptide sequences.

  • RTs (list of float) – List of corresponding retention times.

  • lcp (float, optional) – A multiplier before ln(L) term in the equation for the retention time of a peptide. Set to -0.21 by default.

  • term_aa (bool, optional) – If True, terminal amino acids are treated as being modified with ‘ntermX’/’ctermX’ modifications. False by default.

  • metric (str, optional) –

    Metric for the regression problem. Set to “mse” (mean squared error) by default. Alternative: “mae” (mean absolute error), which uses quantile regression.

    Note

    ”mae” requires scikit-learn for quantile regression.

  • labels (list of str, optional) – List of all possible amino acids and terminal groups If not given, any modX labels are allowed.

Returns:

RC_dict – Dictionary with the calculated retention coefficients.

  • RC_dict[‘aa’] – amino acid retention coefficients.

  • RC_dict[‘const’] – constant retention time shift.

  • RC_dict[‘lcp’] – length correction parameter.

Return type:

dict

Examples

>>> RCs = get_RCs(['A','AA'], [1.0, 2.0], 0.0, labels=['A'])
>>> abs(RCs['aa']['A'] - 1) < 1e-6 and abs(RCs['const']) < 1e-6
True
>>> RCs = get_RCs(['A','AA','B'], [1.0, 2.0, 2.0], 0.0, labels=['A','B'])
>>> abs(RCs['aa']['A'] - 1) + abs(RCs['aa']['B'] - 2) +             abs(RCs['const']) < 1e-6
True
pyteomics.achrom.get_RCs_vary_lcp(sequences, RTs, term_aa=False, lcp_range=(-1.0, 1.0), metric='mse', **kwargs)[source]

Find the best combination of a length correction parameter and retention coefficients for a given peptide sample.

Parameters:
  • sequences (list of str) – List of peptide sequences.

  • RTs (list of float) – List of corresponding retention times.

  • term_aa (bool, optional) – If True, terminal amino acids are treated as being modified with ‘ntermX’/’ctermX’ modifications. False by default.

  • metric (str, optional) –

    Metric for the regression problem. Set to “mse” (mean squared error) by default. Alternative: “mae” (mean absolute error).

    Note

    ”mae” requires scikit-learn for quantile regression.

  • lcp_range (2-tuple of float, optional) – Range of possible values of the length correction parameter.

  • labels (list of str, optional) – List of labels for all possible amino acids and terminal groups If not given, any modX labels are allowed.

  • lcp_accuracy (float, optional) – The accuracy of the length correction parameter calculation.

Returns:

RC_dict – Dictionary with the calculated retention coefficients.

  • RC_dict[‘aa’] – amino acid retention coefficients.

  • RC_dict[‘const’] – constant retention time shift.

  • RC_dict[‘lcp’] – length correction parameter.

Return type:

dict

Examples

>>> RCs = get_RCs_vary_lcp(['A', 'AA', 'AAA'],         [1.0, 2.0, 3.0],         labels=['A'])
>>> abs(RCs['aa']['A'] - 1) + abs(RCs['lcp']) + abs(RCs['const']) < 1e-6
True

«  unimod - interface to the Unimod database   ::   Contents   ::   electrochem - electrochemical properties of polypeptides  »