Pyteomics documentation v4.3.3dev1

ms2 - read and write MS/MS data in MS2 format

«  ms1 - read and write MS/MS data in MS1 format   ::   Contents   ::   pepxml - pepXML file reader  »

ms2 - read and write MS/MS data in MS2 format

Summary

MS2 is a simple human-readable format for MS2 data. It allows storing MS2 peak lists and exprimental parameters.

This module provides minimalistic infrastructure for access to data stored in MS2 files. Two main classes are MS2, which provides an iterative, text-mode parser, and IndexedMS2, which is a binary-mode parser that supports random access using scan IDs and retention times. The function read() helps dispatch between the two classes. Also, common parameters can be read from MS2 file header with read_header() function.

Functions

read() - iterate through spectra in MS2 file. Data from a single spectrum are converted to a human-readable dict.

chain() - read multiple files at once.

chain.from_iterable() - read multiple files at once, using an iterable of files.

read_header() - get a dict with common parameters for all spectra from the beginning of MS2 file.


pyteomics.ms2.chain(*args, **kwargs)

Chain read() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the read() function.

chain.from_iterable(files, **kwargs)

Chain read() for several files. Keyword arguments are passed to the read() function.

Parameters:files – Iterable of file names or file objects.
class pyteomics.ms2.IndexedMS2(source=None, use_header=False, convert_arrays=True, dtype=None, encoding='utf-8', _skip_index=False, **kwargs)[source]

Bases: pyteomics.ms1.IndexedMS1

A class representing an MS2 file. Supports the with syntax and direct iteration for sequential parsing. Specific spectra can be accessed by title using the indexing syntax in constant time. If created using a file object, it needs to be opened in binary mode.

When iterated, IndexedMS2 object yields spectra one by one. Each ‘spectrum’ is a dict with four keys: ‘m/z array’, ‘intensity array’, ‘charge array’ and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, ‘charge array’ is a masked array (numpy.ma.MaskedArray) of ints, and ‘params’ stores a dict of parameters (keys and values are str, keys corresponding to MS2).

Warning

Labels for scan objects are constructed as the first number in the S line, as follows: for a line S  0   1   123.4 the label is ‘0’. If these labels are not unique for the scans in the file, the indexed parser will not work correctly. Consider using MS2 instead.

header

The file header.

Type:dict
time

A property used for accessing spectra by retention time.

Type:RTLocator
__init__(source=None, use_header=False, convert_arrays=True, dtype=None, encoding='utf-8', _skip_index=False, **kwargs)

Instantiate a TaskMappingMixin object, set default parameters for IPC.

Parameters:
  • queue_timeout (float, keyword only, optional) – The number of seconds to block, waiting for a result before checking to see if all workers are done.
  • queue_size (int, keyword only, optional) – The length of IPC queue used.
  • processes (int, keyword only, optional) – Number of worker processes to spawn when map() is called. This can also be specified in the map() call.
map(target=None, processes=-1, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
  • target (Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs
  • processes (int, optional) – The number of worker processes to use. If 0 or negative, defaults to the number of available CPUs. This parameter can also be set at reader creation.
  • args (Sequence, optional) – Additional positional arguments to be passed to the target function
  • kwargs (Mapping, optional) – Additional keyword arguments to be passed to the target function
  • **_kwargs – Additional keyword arguments to be passed to the target function
Yields:

object – The work item returned by the target function.

reset()

Resets the iterator to its initial state.

class pyteomics.ms2.MS2(source=None, use_header=False, convert_arrays=True, dtype=None, encoding=None, **kwargs)[source]

Bases: pyteomics.ms1.MS1

A class representing an MS2 file. Supports the with syntax and direct iteration for sequential parsing.

MS2 object behaves as an iterator, yielding spectra one by one. Each ‘spectrum’ is a dict with three keys: ‘m/z array’, ‘intensity array’, and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, and ‘params’ stores a dict of parameters.

header

The file header.

Type:dict
__init__(source=None, use_header=False, convert_arrays=True, dtype=None, encoding=None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

reset()

Resets the iterator to its initial state.

pyteomics.ms2.read(*args, **kwargs)[source]

Read an MS2 file and return entries iteratively.

Read the specified MS2 file, yield spectra one by one. Each ‘spectrum’ is a dict with three keys: ‘m/z array’, ‘intensity array’, and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, and ‘params’ stores a dict of parameters.

Parameters:
  • source (str or file or None, optional) – A file object (or file name) with data in MS2 format. Default is None, which means read standard input.
  • use_header (bool, optional) – Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is False.
  • convert_arrays (bool, optional) – If False, m/z and intensities will be returned as regular lists. If True (default), they will be converted to regular numpy.ndarray’s. Conversion requires numpy.
  • dtype (type or str or dict, optional) – dtype argument to numpy array constructor, one for all arrays or one for each key. Keys should be ‘m/z array’ and/or ‘intensity array’.
  • encoding (str, optional) – File encoding.
  • use_index (bool, optional) –

    Determines which parsing method to use. If True, an instance of IndexedMS2 is created. This facilitates random access by scan titles. If an open file is passed as source, it needs to be open in binary mode.

    Warning

    Labels for scan objects are constructed as the first number in the S line, as follows: for a line S  0   1   123.4 the label is ‘0’. If these labels are not unique for the scans in the file, the indexed parser will not work correctly.

    If False (default), an instance of MS2 is created. It reads source in text mode and is suitable for iterative parsing.

  • block_size (int, optinal) – Size of the chunk (in bytes) used to parse the file when creating the byte offset index. (Accepted only for IndexedMS2.)
Returns:

An instance of MS2 or IndexedMS2, depending on use_index and source.

Return type:

out

pyteomics.ms2.read_header(source, *args, **kwargs)[source]

Read the specified MS2 file, get the parameters specified in the header as a dict.

Parameters:source (str or file) – File name or file object representing an file in MS2 format.
Returns:header
Return type:dict

«  ms1 - read and write MS/MS data in MS1 format   ::   Contents   ::   pepxml - pepXML file reader  »