Pyteomics documentation v5.0rc2

ms1 - read and write MS/MS data in MS1 format

«  mgf - read and write MS/MS data in Mascot Generic Format   ::   Contents   ::   ms2 - read and write MS/MS data in MS2 format  »

ms1 - read and write MS/MS data in MS1 format

Summary

MS1 is a simple human-readable format for MS1 data. It allows storing MS1 peak lists and exprimental parameters.

This module provides minimalistic infrastructure for access to data stored in MS1 files. Two main classes are MS1, which provides an iterative, text-mode parser, and IndexedMS1, which is a binary-mode parser that supports random access using scan IDs and retention times. The function read() helps dispatch between the two classes. Also, common parameters can be read from MS1 file header with read_header() function.

Classes

MS1 - a text-mode MS1 parser. Suitable to read spectra from a file consecutively. Needs a file opened in text mode (or will open it if given a file name).

IndexedMS1 - a binary-mode MS1 parser. When created, builds a byte offset index for fast random access by spectrum ID. Sequential iteration is also supported. Needs a seekable file opened in binary mode (if created from existing file object).

MS1Base - abstract class, the common ancestor of the two classes above. Can be used for type checking.

Functions

read() - an alias for MS1 or IndexedMS1.

chain() - read multiple files at once.

chain.from_iterable() - read multiple files at once, using an iterable of files.

read_header() - get a dict with common parameters for all spectra from the beginning of MS1 file.


pyteomics.ms1.chain(*args, **kwargs)

Chain read() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the read() function.

chain.from_iterable(files, **kwargs)

Chain read() for several files. Keyword arguments are passed to the read() function.

Parameters:

files – Iterable of file names or file objects.

class pyteomics.ms1.IndexedMS1(source=None, use_header=False, convert_arrays=True, dtype=None, encoding='utf-8', _skip_index=False, **kwargs)[source]

Bases: MS1Base, TaskMappingMixin, TimeOrderedIndexedReaderMixin, IndexedTextReader

A class representing an MS1 file. Supports the with syntax and direct iteration for sequential parsing. Specific spectra can be accessed by title using the indexing syntax in constant time. If created using a file object, it needs to be opened in binary mode.

When iterated, IndexedMS1 object yields spectra one by one. Each ‘spectrum’ is a dict with three keys: ‘m/z array’, ‘intensity array’ and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, and ‘params’ stores a dict of parameters (keys and values are str, keys corresponding to MS1).

Warning

Labels for scan objects are constructed as the first number in the S line, as follows: for a line S  0   1 the label is ‘0’. If these labels are not unique for the scans in the file, the indexed parser will not work correctly. Consider using MS1 instead.

header

The file header.

Type:

dict

time

A property used for accessing spectra by retention time.

Type:

RTLocator

__init__(source=None, use_header=False, convert_arrays=True, dtype=None, encoding='utf-8', _skip_index=False, **kwargs)[source]

Create an IndexedMS1 (binary-mode) reader for a given MS1 file.

Parameters:
  • source (str or file or None, optional) –

    A file object (or file name) with data in MS1 format. Default is None, which means read standard input.

    Note

    If a file object is given, it must be opened in binary mode.

  • use_header (bool, optional) – Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is True.

  • convert_arrays (one of {0, 1, 2}, optional) – If 0, m/z, intensities and (possibly) charges will be returned as regular lists. If 1, they will be converted to regular numpy.ndarray’s. If 2, charges will be reported as a masked array (default). The default option is the slowest. 1 and 2 require numpy.

  • dtype (type or str or dict, optional) – dtype argument to numpy array constructor, one for all arrays or one for each key. Keys should be ‘m/z array’, ‘intensity array’, ‘charge array’.

  • encoding (str, optional) – File encoding.

  • block_size (int, optinal) – Size of the chunk (in bytes) used to parse the file when creating the byte offset index.

Returns:

out – The reader object.

Return type:

IndexedMS1

map(target=None, workers=None, args=None, kwargs=None, method='mp', **_kwargs)

Execute the target function over entries of this object in parallel. The type of parallelism is determined by the method parameter.

Results will be returned out of order.

Parameters:
  • target (Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs.

  • workers (int, optional) – The number of worker threads or processes to use. The default depends on the method parameter.

  • args (Sequence, optional) – Additional positional arguments to be passed to the target function.

  • kwargs (Mapping, optional) – Additional keyword arguments to be passed to the target function.

  • method (str, optional) –

    The type of parallelism to use. Can be one of the following:

    • either one of ‘p’, ‘mp’, ‘processes’, or ‘multiprocessing’: use multiprocessing This is the default. This is also equivalent to calling pmap(), see there for details.

    • either one of ‘t’, ‘threading’, or ‘threads’: use threading This is also equivalent to calling tmap(), see there for details.

  • **_kwargs – Additional keyword arguments to be passed to the target function.

Yields:

object – The work item returned by the target function.

pmap(target=None, workers=None, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to workers processes.

Results will be returned out of order.

Parameters:
  • target (Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs.

  • workers (int or None, optional) – The number of worker processes to use. If not a positive integer, defaults to the number of available CPUs. This parameter can also be set at reader creation.

  • args (Sequence, optional) – Additional positional arguments to be passed to the target function.

  • kwargs (Mapping, optional) – Additional keyword arguments to be passed to the target function.

  • **_kwargs – Additional keyword arguments to be passed to the target function.

Yields:

object – The work item returned by the target function.

reset()

Resets the iterator to its initial state.

tmap(target=None, workers=None, args=None, kwargs=None, chunk_size=None, **_kwargs)

Execute the target function over entries of this object across up to workers threads.

Results will be returned out of order.

Parameters:
  • target (Callable, optional) –

    The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs.

    Warning

    target must be thread-safe. The target function cannot interact with the underlying file object directly.

  • workers (int or None, optional) – The number of worker threads to use. If not a positive integer, defaults to the number of available CPUs.

  • args (Sequence, optional) – Additional positional arguments to be passed to the target function.

  • kwargs (Mapping, optional) – Additional keyword arguments to be passed to the target function.

  • chunk_size (int, optional) – The number of work items to hand out to each worker thread at a time. If not specified, defaults to chunk_size attribute of this object.

  • **_kwargs – Additional keyword arguments to be passed to the target function.

Yields:

object – The work item returned by the target function.

class pyteomics.ms1.MS1(source=None, use_header=False, convert_arrays=True, dtype=None, encoding=None, **kwargs)[source]

Bases: MS1Base, FileReader

A class representing an MS1 file. Supports the with syntax and direct iteration for sequential parsing.

MS1 object behaves as an iterator, yielding spectra one by one. Each ‘spectrum’ is a dict with three keys: ‘m/z array’, ‘intensity array’, and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, and ‘params’ stores a dict of parameters.

header

The file header.

Type:

dict

__init__(source=None, use_header=False, convert_arrays=True, dtype=None, encoding=None, **kwargs)[source]

Create an MS1 (text-mode) reader for a given MS1 file.

Parameters:
  • source (str or file or None, optional) –

    A file object (or file name) with data in MS1 format. Default is None, which means read standard input.

    Note

    If a file object is given, it must be opened in text mode.

  • use_header (bool, optional) – Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is False.

  • convert_arrays (one of {0, 1, 2}, optional) – If 0, m/z, intensities and (possibly) charges will be returned as regular lists. If 1, they will be converted to regular numpy.ndarray’s. If 2, charges will be reported as a masked array (default). The default option is the slowest. 1 and 2 require numpy.

  • dtype (type or str or dict, optional) – dtype argument to numpy array constructor, one for all arrays or one for each key. Keys should be ‘m/z array’, ‘intensity array’, ‘charge array’.

  • encoding (str, optional) – File encoding.

Returns:

out – The reader object.

Return type:

MS1

reset()

Resets the iterator to its initial state.

class pyteomics.ms1.MS1Base(source=None, use_header=False, convert_arrays=True, dtype=None, encoding=None, **kwargs)[source]

Bases: ArrayConversionMixin

Abstract class representing an MS1 file. Subclasses implement different approaches to parsing.

__init__(source=None, use_header=False, convert_arrays=True, dtype=None, encoding=None, **kwargs)[source]

Create an instance of a MS1Base parser.

Parameters:
  • source (str or file or None, optional) – A file object (or file name) with data in MS1 format. Default is None, which means read standard input.

  • use_header (bool, optional) – Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is False.

  • convert_arrays (one of {0, 1, 2}, optional) – If 0, m/z, intensities and (possibly) charges will be returned as regular lists. If 1, they will be converted to regular numpy.ndarray’s. If 2, charges will be reported as a masked array (default). The default option is the slowest. 1 and 2 require numpy.

  • dtype (type or str or dict, optional) – dtype argument to numpy array constructor, one for all arrays or one for each key. Keys should be ‘m/z array’, ‘intensity array’, ‘charge array’.

  • encoding (str, optional) – File encoding.

pyteomics.ms1.read(*args, **kwargs)[source]

Read an MS1 file and return entries iteratively.

Read the specified MS1 file, yield spectra one by one. Each ‘spectrum’ is a dict with three keys: ‘m/z array’, ‘intensity array’, and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, and ‘params’ stores a dict of parameters.

Parameters:
  • source (str or file or None, optional) – A file object (or file name) with data in MS1 format. Default is None, which means read standard input.

  • use_header (bool, optional) – Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is False.

  • convert_arrays (one of {0, 1, 2}, optional) – If 0, m/z, intensities and (possibly) charges will be returned as regular lists. If 1, they will be converted to regular numpy.ndarray’s. If 2, charges will be reported as a masked array (default). The default option is the slowest. 1 and 2 require numpy.

  • dtype (type or str or dict, optional) – dtype argument to numpy array constructor, one for all arrays or one for each key. Keys should be ‘m/z array’ and/or ‘intensity array’.

  • encoding (str, optional) – File encoding.

  • use_index (bool, optional) –

    Determines which parsing method to use. If True, an instance of IndexedMS1 is created. This facilitates random access by scan titles. If an open file is passed as source, it needs to be open in binary mode.

    If False (default), an instance of MS1 is created. It reads source in text mode and is suitable for iterative parsing.

    Warning

    Labels for scan objects are constructed as the first number in the S line, as follows: for a line S  0   1 the label is ‘0’. If these labels are not unique for the scans in the file, the indexed parser will not work correctly.

  • block_size (int, optinal) – Size of the chunk (in bytes) used to parse the file when creating the byte offset index. (Accepted only for IndexedMS1.)

Returns:

out – An instance of MS1 or IndexedMS1, depending on use_index and source.

Return type:

MS1Base

pyteomics.ms1.read_header(source, *args, **kwargs)[source]

Read the specified MS1 file, get the parameters specified in the header as a dict.

Parameters:

source (str or file) – File name or file object representing an file in MS1 format.

Returns:

header

Return type:

dict

«  mgf - read and write MS/MS data in Mascot Generic Format   ::   Contents   ::   ms2 - read and write MS/MS data in MS2 format  »