Pyteomics documentation v3.5.1

mgf - read and write MS/MS data in Mascot Generic Format

Contents   ::   ms1 - read and write MS/MS data in MS1 format  »

mgf - read and write MS/MS data in Mascot Generic Format


MGF is a simple human-readable format for MS/MS data. It allows storing MS/MS peak lists and exprimental parameters.

This module provides minimalistic infrastructure for access to data stored in MGF files. The most important function is read(), which reads spectra and related information as saves them into human-readable dicts. Also, common parameters can be read from MGF file header with read_header() function. write() allows creation of MGF files.


MGF - a class representing an MGF file. Use it to read spectra from a file consecutively or by title.


read() - iterate through spectra in MGF file. Data from a single spectrum are converted to a human-readable dict.

get_spectrum() - read a single spectrum with given title from a file.

chain() - read multiple files at once.

chain.from_iterable() - read multiple files at once, using an iterable of files.

read_header() - get a dict with common parameters for all spectra from the beginning of MGF file.

write() - write an MGF file.

pyteomics.mgf.chain(*args, **kwargs)

Chain read() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the read() function.

chain.from_iterable(files, **kwargs)

Chain read() for several files. Keyword arguments are passed to the read() function.

files : iterable
Iterable of file names or file objects.
class pyteomics.mgf.MGF(source=None, use_header=True, convert_arrays=2, read_charges=True, dtype=None, encoding=None)[source]

Bases: pyteomics.auxiliary.file_helpers.FileReader

A class representing an MGF file. Supports the with syntax and direct iteration for sequential parsing. Specific spectra can be accessed by title using the indexing syntax.

MGF object behaves as an iterator, yielding spectra one by one. Each ‘spectrum’ is a dict with four keys: ‘m/z array’, ‘intensity array’, ‘charge array’ and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, ‘charge array’ is a masked array ( of ints, and ‘params’ stores a dict of parameters (keys and values are str, keys corresponding to MGF, lowercased).

header : dict

The file header.


__init__(source=None, use_header=True, convert_arrays=2, read_charges=True, dtype=None, encoding=None)[source]

Create an MGF file object.

source : str or file or None, optional

A file object (or file name) with data in MGF format. Default is None, which means read standard input.

use_header : bool, optional

Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is True.

convert_arrays : one of {0, 1, 2}, optional

If 0, m/z, intensities and (possibly) charges will be returned as regular lists. If 1, they will be converted to regular numpy.ndarray’s. If 2, charges will be reported as a masked array (default). The default option is the slowest. 1 and 2 require numpy.

read_charges : bool, optional

If True (default), fragment charges are reported. Disabling it improves performance.

dtype : type or str or dict, optional

dtype argument to numpy array constructor, one for all arrays or one for each key. Keys should be ‘m/z array’, ‘intensity array’ and/or ‘charge array’.

encoding : str, optional

Encoding to read the files in. Default is UTF-8.


Resets the iterator to its initial state.

pyteomics.mgf.get_spectrum(source, title, use_header=True, convert_arrays=2, read_charges=True, dtype=None)[source]

Read one spectrum (with given title) from source.

See read() for explanation of parameters affecting the output.


Only the key-value pairs after the “TITLE =” line will be included in the output.

source : str or file or None

File to read from.

title : str

Spectrum title.

The rest of the arguments are the same as for :py:func:`read`.
out : dict or None

A dict with the spectrum, if it is found, and None otherwise.*args, **kwargs)[source]

Read an MGF file and return entries iteratively.


This is an alias to MGF.

out : MGF
pyteomics.mgf.read_header(*args, **kwargs)[source]

Read the specified MGF file, get search parameters specified in the header as a dict, the keys corresponding to MGF format (lowercased).

source : str or file

File name or file object representing an file in MGF format.

header : dict
pyteomics.mgf.write(*args, **kwargs)[source]

Create a file in MGF format.

spectra : iterable

A sequence of dictionaries with keys ‘m/z array’, ‘intensity array’, and ‘params’. ‘m/z array’ and ‘intensity array’ should be sequences of int, float, or str. Strings will be written ‘as is’. The sequences should be of equal length, otherwise excessive values will be ignored.

‘params’ should be a dict with keys corresponding to MGF format. Keys must be strings, they will be uppercased and used as is, without any format consistency tests. Values can be of any type allowing string representation.

‘charge array’ can also be specified.

output : str or file or None, optional

Path or a file-like object open for writing. If an existing file is specified by file name, it will be opened for appending. In this case writing with a header can result in violation of format conventions. Default value is None, which means using standard output.

header : dict or (multiline) str or list of str, optional

In case of a single string or a list of strings, the header will be written ‘as is’. In case of dict, the keys (must be strings) will be uppercased.

write_charges : bool, optional

If False, fragment charges from ‘charge array’ will not be written. Default is True.

fragment_format : str, optional

Format string for m/z, intensity and charge of a fragment. Useful to set the number of decimal places, e.g.: fragment_format='%.4f %.0f'. Default is '{} {} {}'.


The supported format syntax differs depending on other parameters. If use_numpy is True and numpy is available, fragment peaks will be written using numpy.savetxt(). Then, fragment_format must be recognized by that function.

Otherwise, plain Python string formatting is done. See the docs for details on writing the format string. If some or all charges are missing, an empty string is substituted instead, so formatting as float or int will raise an exception. Hence it is safer to just use {} for charges.

key_order : list, optional

A list of strings specifying the order in which params will be written in the spectrum header. Unlisted keys will be in arbitrary order. Default is _default_key_order.


This does not affect the order of lines in the global header.

param_formatters : dict, optional

A dict mapping parameter names to functions. Each function must accept two arguments (key and value) and return a string. Default is _default_value_formatters.

use_numpy : bool, optional

Controls whether fragment peak arrays are written using numpy.savetxt(). Using numpy.savetxt() is faster, but cannot handle sparse arrays of fragment charges. You may want to disable this if you need to save spectra with ‘charge arrays’ with missing values.

If not specified, will be set to the opposite of write_chrages. If numpy is not available, this parameter has no effect.

file_mode : str, keyword only, optional

If output is a file name, defines the mode the file will be opened in. Otherwise will be ignored. Default is ‘a’.

encoding : str, keyword only, optional

Output file encoding (if output is specified by name).

output : file

Contents   ::   ms1 - read and write MS/MS data in MS1 format  »