Pyteomics documentation v4.5.3dev2

Example 2: Fragmentation

«  Example 1: Unravelling the Peptidome   ::   Contents   ::   Example 3: Search engines and PSM filtering  »

Example 2: Fragmentation


Pyteomics has come a long way since this example was written. Check example 4 for new Pyteomics tools you should know about.

In this example, we are going to retrieve MS/MS data from an MGF file and compare it to identification info we read from a pepXML file. We are going to compare the MS/MS spectrum in the file with the theoretical spectrum of a peptide assigned to this spectrum by the search engine.

The script source can be downloaded here. We will also need the example MGF file and the example pepXML file, but the script will download them for you.

The MGF file has a single MS/MS spectrum in it. This spectrum is taken from the SwedCAD database of annotated MS/MS spectra. The pepXML file was obtained by running X!Tandem against the MGF file and converting the results to pepXML with the Tandem2XML tool from TPP.

Let’s start with importing the modules.

from pyteomics import mgf, pepxml, mass
import os
from urllib.request import urlopen, Request
import pylab

Then we’ll download the files, if needed:

for fname in ('mgf', 'pep.xml'):
    if not os.path.isfile('example.' + fname):
        headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
        url = '' + fname
        request = Request(url, None, headers)
        target_name = 'example.' + fname
        with urlopen(request) as response, open(target_name, 'wb') as fout:
            print('Downloading ' + target_name + '...')

Now it’s time to define the function that will give us m/z of theoretical fragments for a given sequence. We will use pyteomics.mass.fast_mass() to calculate the values. All we need to do is split the sequence at every bond and iterate over possible charges and ion types:

def fragments(peptide, types=('b', 'y'), maxcharge=1):
    The function generates all possible m/z for fragments of types
    `types` and of charges from 1 to `maxharge`.
    for i in range(1, len(peptide)-1):
        for ion_type in types:
            for charge in range(1, maxcharge+1):
                if ion_type[0] in 'abc':
                    yield mass.fast_mass(
                            peptide[:i], ion_type=ion_type, charge=charge)
                    yield mass.fast_mass(
                            peptide[i:], ion_type=ion_type, charge=charge)

So, the outer loop is over “fragmentation sites”, the next one is over ion types, then over charges, and lastly over two parts of the sequence (C- and N-terminal).

All right, now it’s time to extract the info from the files. We are going to use the with statement syntax, which is not required, but recommended.

with'example.mgf') as spectra,'example.pep.xml') as psms:
    spectrum = next(spectra)
    psm = next(psms)

Now prepare the figure…

pylab.title('Theoretical and experimental spectra for '
        + psm['search_hit'][0]['peptide'])
pylab.xlabel('m/z, Th')
pylab.ylabel('Intensity, rel. units')

… plot the real spectrum:['m/z array'], spectrum['intensity array'], width=0.1, linewidth=2,

… calculate and plot the theoretical spectrum, and show everything:

theor_spectrum = list(fragments(psm['search_hit'][0]['peptide'],
        [spectrum['intensity array'].max()]*len(theor_spectrum),
        width=0.1, edgecolor='red', alpha=0.7)

You will see something like this:


That’s it, as you can see, the most intensive peaks in the spectrum are indeed matched by the theoretical spectrum.

«  Example 1: Unravelling the Peptidome   ::   Contents   ::   Example 3: Search engines and PSM filtering  »