ms2 - read and write MS/MS data in MS2 format¶
Summary¶
MS2 is a simple human-readable format for MS2 data. It allows storing MS2 peak lists and exprimental parameters.
This module provides minimalistic infrastructure for access to data stored in
MS2 files.
Two main classes are MS2, which provides an iterative, text-mode parser,
and IndexedMS2, which is a binary-mode parser that supports random access using scan IDs
and retention times.
The function read() helps dispatch between the two classes.
Also, common parameters can be read from MS2 file header with
read_header() function.
Classes¶
MS2- a text-mode MS2 parser. Suitable to read spectra from a file consecutively. Needs a file opened in text mode (or will open it if given a file name).
IndexedMS2- a binary-mode MS2 parser. When created, builds a byte offset index for fast random access by spectrum ID. Sequential iteration is also supported. Needs a seekable file opened in binary mode (if created from existing file object).
MS2Base- abstract class, the common ancestor of the two classes above. Can be used for type checking.
Functions¶
read()- an alias forMS2orIndexedMS1.
chain()- read multiple files at once.
chain.from_iterable()- read multiple files at once, using an iterable of files.
read_header()- get a dict with common parameters for all spectra from the beginning of MS2 file.
- class pyteomics.ms2.IndexedMS2(source=None, use_header=False, convert_arrays=2, dtype=None, read_charges=True, read_resolutions=True, encoding='utf-8', _skip_index=False, **kwargs)[source]¶
Bases:
IndexedMS1,MS2BaseA class representing an MS2 file. Supports the with syntax and direct iteration for sequential parsing. Specific spectra can be accessed by title using the indexing syntax in constant time. If created using a file object, it needs to be opened in binary mode.
When iterated,
IndexedMS2object yields spectra one by one. Each ‘spectrum’ is adictwith four keys: ‘m/z array’, ‘intensity array’, ‘charge array’ and ‘params’. ‘m/z array’ and ‘intensity array’ storenumpy.ndarray’s of floats, ‘charge array’ is a masked array (numpy.ma.MaskedArray) of ints, and ‘params’ stores adictof parameters (keys and values arestr, keys corresponding to MS2).Warning
Labels for scan objects are constructed as the first number in the S line, as follows: for a line
S 0 1 123.4the label is ‘0’. If these labels are not unique for the scans in the file, the indexed parser will not work correctly. Consider usingMS2instead.- time¶
A property used for accessing spectra by retention time.
- Type:
RTLocator
- __init__(source=None, use_header=False, convert_arrays=2, dtype=None, read_charges=True, read_resolutions=True, encoding='utf-8', _skip_index=False, **kwargs)[source]¶
Create an
IndexedMS2(binary-mode) reader for a given MS2 file.- Parameters:
source (str or file or None, optional) –
A file object (or file name) with data in MS2 format. Default is
None, which means read standard input.Note
If a file object is given, it must be opened in binary mode.
use_header (bool, optional) – Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is
True.convert_arrays (one of {0, 1, 2}, optional) – If 0, m/z, intensities and (possibly) charges will be returned as regular lists. If 1, they will be converted to regular
numpy.ndarray’s. If 2, charges will be reported as a masked array (default). The default option is the slowest. 1 and 2 requirenumpy.read_charges (bool, optional) – If True (default), fragment charges are reported. Disabling it improves performance. Charge is expected to be the third number on the line, after peak m/z and intensity.
read_resolutions (bool, optional) – If True (default), fragment peak resolutions are reported. Disabling it improves performance. Resolution is expected to be the fourth number on the line, after peak m/z, intensity, and charge.
dtype (type or str or dict, optional) – dtype argument to
numpyarray constructor, one for all arrays or one for each key. Keys should be ‘m/z array’, ‘intensity array’, ‘charge array’.encoding (str, optional) – File encoding.
block_size (int, optinal) – Size of the chunk (in bytes) used to parse the file when creating the byte offset index.
- Returns:
out – The reader object.
- Return type:
- map(target=None, workers=None, args=None, kwargs=None, method='mp', **_kwargs)¶
Execute the
targetfunction over entries of this object in parallel. The type of parallelism is determined by themethodparameter.Results will be returned out of order.
- Parameters:
target (
Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values inargsandkwargs.workers (int, optional) – The number of worker threads or processes to use. The default depends on the
methodparameter.args (
Sequence, optional) – Additional positional arguments to be passed to the target function.kwargs (
Mapping, optional) – Additional keyword arguments to be passed to the target function.method (str, optional) –
The type of parallelism to use. Can be one of the following:
**_kwargs – Additional keyword arguments to be passed to the target function.
- Yields:
object – The work item returned by the target function.
- pmap(target=None, workers=None, args=None, kwargs=None, **_kwargs)¶
Execute the
targetfunction over entries of this object across up toworkersprocesses.Results will be returned out of order.
- Parameters:
target (
Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values inargsandkwargs.workers (int or None, optional) – The number of worker processes to use. If not a positive integer, defaults to the number of available CPUs. This parameter can also be set at reader creation.
args (
Sequence, optional) – Additional positional arguments to be passed to the target function.kwargs (
Mapping, optional) – Additional keyword arguments to be passed to the target function.**_kwargs – Additional keyword arguments to be passed to the target function.
- Yields:
object – The work item returned by the target function.
- reset()¶
Resets the iterator to its initial state.
- tmap(target=None, workers=None, args=None, kwargs=None, chunk_size=None, **_kwargs)¶
Execute the
targetfunction over entries of this object across up toworkersthreads.Results will be returned out of order.
- Parameters:
target (
Callable, optional) –The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in
argsandkwargs.Warning
target must be thread-safe. The target function cannot interact with the underlying file object directly.
workers (int or None, optional) – The number of worker threads to use. If not a positive integer, defaults to the number of available CPUs.
args (
Sequence, optional) – Additional positional arguments to be passed to the target function.kwargs (
Mapping, optional) – Additional keyword arguments to be passed to the target function.chunk_size (int, optional) – The number of work items to hand out to each worker thread at a time. If not specified, defaults to
chunk_sizeattribute of this object.**_kwargs – Additional keyword arguments to be passed to the target function.
- Yields:
object – The work item returned by the target function.
- class pyteomics.ms2.MS2(*args, **kwargs)[source]¶
-
A class representing an MS2 file. Supports the with syntax and direct iteration for sequential parsing.
MS2object behaves as an iterator, yielding spectra one by one. Each ‘spectrum’ is adictwith three keys: ‘m/z array’, ‘intensity array’, and ‘params’. ‘m/z array’ and ‘intensity array’ storenumpy.ndarray’s of floats, and ‘params’ stores adictof parameters.- __init__(*args, **kwargs)[source]¶
Create an
MS2(text-mode) reader for a given MS2 file.- Parameters:
source (str or file or None, optional) –
A file object (or file name) with data in MS2 format. Default is
None, which means read standard input.Note
If a file object is given, it must be opened in text mode.
use_header (bool, optional) – Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is
False.convert_arrays (one of {0, 1, 2}, optional) – If 0, m/z, intensities and (possibly) charges will be returned as regular lists. If 1, they will be converted to regular
numpy.ndarray’s. If 2, charges will be reported as a masked array (default). The default option is the slowest. 1 and 2 requirenumpy.read_charges (bool, optional) – If True (default), fragment charges are reported. Disabling it improves performance. Charge is expected to be the third number on the line, after peak m/z and intensity.
read_resolutions (bool, optional) – If True (default), fragment peak resolutions are reported. Disabling it improves performance. Resolution is expected to be the fourth number on the line, after peak m/z, intensity, and charge.
dtype (type or str or dict, optional) – dtype argument to
numpyarray constructor, one for all arrays or one for each key. Keys should be ‘m/z array’, ‘intensity array’, ‘charge array’.encoding (str, optional) – File encoding.
- Returns:
out – The reader object.
- Return type:
- reset()¶
Resets the iterator to its initial state.
- class pyteomics.ms2.MS2Base(source=None, use_header=False, convert_arrays=2, dtype=None, read_charges=True, read_resolutions=True, encoding=None, **kwargs)[source]¶
Bases:
MaskedArrayConversionMixin,MS1BaseAbstract class representing an MS2 file. Subclasses implement different approaches to parsing.
- __init__(source=None, use_header=False, convert_arrays=2, dtype=None, read_charges=True, read_resolutions=True, encoding=None, **kwargs)[source]¶
Create an instance of a
MS2Baseparser.- Parameters:
source (str or file or None, optional) – A file object (or file name) with data in MS1 format. Default is
None, which means read standard input.use_header (bool, optional) – Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is
False.convert_arrays (one of {0, 1, 2}, optional) – If 0, m/z, intensities and (possibly) charges will be returned as regular lists. If 1, they will be converted to regular
numpy.ndarray’s. If 2, charges will be reported as a masked array (default). The default option is the slowest. 1 and 2 requirenumpy.read_charges (bool, optional) – If True (default), fragment charges are reported. Disabling it improves performance. Charge is expected to be the third number on the line, after peak m/z and intensity.
read_resolutions (bool, optional) – If True (default), fragment peak resolutions are reported. Disabling it improves performance. Resolution is expected to be the fourth number on the line, after peak m/z, intensity, and charge.
dtype (type or str or dict, optional) – dtype argument to
numpyarray constructor, one for all arrays or one for each key. Keys should be ‘m/z array’, ‘intensity array’, ‘charge array’, ‘resolution array’.encoding (str, optional) – File encoding.
- pyteomics.ms2.read(*args, **kwargs)[source]¶
Read an MS2 file and return entries iteratively.
Read the specified MS2 file, yield spectra one by one. Each ‘spectrum’ is a
dictwith three keys: ‘m/z array’, ‘intensity array’, and ‘params’. ‘m/z array’ and ‘intensity array’ storenumpy.ndarray’s of floats, and ‘params’ stores adictof parameters.- Parameters:
source (str or file or None, optional) – A file object (or file name) with data in MS2 format. Default is
None, which means read standard input.use_header (bool, optional) – Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is
False.convert_arrays (bool, optional) – If
False, m/z and intensities will be returned as regular lists. IfTrue(default), they will be converted to regularnumpy.ndarray’s. Conversion requiresnumpy.read_charges (bool, optional) – If True (default), fragment charges are reported. Disabling it improves performance. Charge is expected to be the third number on the line, after peak m/z and intensity.
read_resolutions (bool, optional) – If True (default), fragment peak resolutions are reported. Disabling it improves performance. Resolution is expected to be the fourth number on the line, after peak m/z, intensity, and charge.
dtype (type or str or dict, optional) – dtype argument to
numpyarray constructor, one for all arrays or one for each key. Keys should be ‘m/z array’ and/or ‘intensity array’.encoding (str, optional) – File encoding.
use_index (bool, optional) –
Determines which parsing method to use. If
True, an instance ofIndexedMS2is created. This facilitates random access by scan titles. If an open file is passed as source, it needs to be open in binary mode.Warning
Labels for scan objects are constructed as the first number in the S line, as follows: for a line
S 0 1 123.4the label is ‘0’. If these labels are not unique for the scans in the file, the indexed parser will not work correctly.If
False(default), an instance ofMS2is created. It reads source in text mode and is suitable for iterative parsing.block_size (int, optinal) – Size of the chunk (in bytes) used to parse the file when creating the byte offset index. (Accepted only for
IndexedMS2.)
- Returns:
An instance of
MS2orIndexedMS2, depending on use_index and source.- Return type:
out