Pyteomics documentation v4.2.0dev1

ms1 - read and write MS/MS data in MS1 format

Contents

ms1 - read and write MS/MS data in MS1 format

Summary

MS1 is a simple human-readable format for MS1 data. It allows storing MS1 peak lists and exprimental parameters.

This module provides minimalistic infrastructure for access to data stored in MS1 files. Two main classes are MS1, which provides an iterative, text-mode parser, and IndexedMS1, which is a binary-mode parser that supports random access using scan IDs and retention times. The function read() helps dispatch between the two classes. Also, common parameters can be read from MS1 file header with read_header() function.

Functions

read() - iterate through spectra in MS1 file. Data from a single spectrum are converted to a human-readable dict.

chain() - read multiple files at once.

chain.from_iterable() - read multiple files at once, using an iterable of files.

read_header() - get a dict with common parameters for all spectra from the beginning of MS1 file.


pyteomics.ms1.chain(*args, **kwargs)

Chain read() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the read() function.

chain.from_iterable(files, **kwargs)

Chain read() for several files. Keyword arguments are passed to the read() function.

files : iterable
Iterable of file names or file objects.
class pyteomics.ms1.IndexedMS1(source=None, use_header=False, convert_arrays=True, dtype=None, encoding='utf-8', _skip_index=False, **kwargs)[source]

Bases: pyteomics.ms1.MS1Base, pyteomics.auxiliary.file_helpers.TaskMappingMixin, pyteomics.auxiliary.file_helpers.TimeOrderedIndexedReaderMixin, pyteomics.auxiliary.file_helpers.IndexedTextReader

A class representing an MS1 file. Supports the with syntax and direct iteration for sequential parsing. Specific spectra can be accessed by title using the indexing syntax in constant time. If created using a file object, it needs to be opened in binary mode.

When iterated, IndexedMS1 object yields spectra one by one. Each ‘spectrum’ is a dict with four keys: ‘m/z array’, ‘intensity array’, ‘charge array’ and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, ‘charge array’ is a masked array (numpy.ma.MaskedArray) of ints, and ‘params’ stores a dict of parameters (keys and values are str, keys corresponding to MS1).

Warning

Labels for scan objects are constructed as the first number in the S line, as follows: for a line S  0   1 the label is ‘0’. If these labels are not unique for the scans in the file, the indexed parser will not work correctly. Consider using MS1 instead.

Attributes:
header : dict

The file header.

time : RTLocator

A property used for accessing spectra by retention time.

Methods

map(self[, target, processes, args, kwargs]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_id  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_spectrum  
next  
reset  
__init__(self, source=None, use_header=False, convert_arrays=True, dtype=None, encoding='utf-8', _skip_index=False, **kwargs)[source]

Instantiate a TaskMappingMixin object, set default parameters for IPC.

Parameters:
queue_timeout : float, keyword only, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

queue_size : int, keyword only, optional

The length of IPC queue used.

processes : int, keyword only, optional

Number of worker processes to spawn when map() is called. This can also be specified in the map() call.

map(self, target=None, processes=-1, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If 0 or negative, defaults to the number of available CPUs. This parameter can also be set at reader creation.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

class pyteomics.ms1.MS1(source=None, use_header=False, convert_arrays=True, dtype=None, encoding=None, **kwargs)[source]

Bases: pyteomics.ms1.MS1Base, pyteomics.auxiliary.file_helpers.FileReader

A class representing an MS1 file. Supports the with syntax and direct iteration for sequential parsing.

MS1 object behaves as an iterator, yielding spectra one by one. Each ‘spectrum’ is a dict with three keys: ‘m/z array’, ‘intensity array’, and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, and ‘params’ stores a dict of parameters.

Attributes:
header : dict

The file header.

Methods

next  
reset  
__init__(self, source=None, use_header=False, convert_arrays=True, dtype=None, encoding=None, **kwargs)[source]

x.__init__(…) initializes x; see help(type(x)) for signature

reset(self)

Resets the iterator to its initial state.

class pyteomics.ms1.MS1Base(source=None, use_header=False, convert_arrays=True, dtype=None, **kwargs)[source]

Bases: object

Abstract class representing an MS1 file. Subclasses implement different approaches to parsing.

Attributes:
header
__init__(self, source=None, use_header=False, convert_arrays=True, dtype=None, **kwargs)[source]

x.__init__(…) initializes x; see help(type(x)) for signature

pyteomics.ms1.read(*args, **kwargs)[source]

Read an MS1 file and return entries iteratively.

Read the specified MS1 file, yield spectra one by one. Each ‘spectrum’ is a dict with three keys: ‘m/z array’, ‘intensity array’, and ‘params’. ‘m/z array’ and ‘intensity array’ store numpy.ndarray’s of floats, and ‘params’ stores a dict of parameters.

Parameters:
source : str or file or None, optional

A file object (or file name) with data in MS1 format. Default is None, which means read standard input.

use_header : bool, optional

Add the info from file header to each dict. Spectrum-specific parameters override those from the header in case of conflict. Default is False.

convert_arrays : bool, optional

If False, m/z and intensities will be returned as regular lists. If True (default), they will be converted to regular numpy.ndarray’s. Conversion requires numpy.

dtype : type or str or dict, optional

dtype argument to numpy array constructor, one for all arrays or one for each key. Keys should be ‘m/z array’ and/or ‘intensity array’.

encoding : str, optional

File encoding.

use_index : bool, optional

Determines which parsing method to use. If True, an instance of IndexedMS1 is created. This facilitates random access by scan titles. If an open file is passed as source, it needs to be open in binary mode.

If False (default), an instance of MS1 is created. It reads source in text mode and is suitable for iterative parsing.

Warning

Labels for scan objects are constructed as the first number in the S line, as follows: for a line S  0   1 the label is ‘0’. If these labels are not unique for the scans in the file, the indexed parser will not work correctly.

block_size : int, optinal

Size of the chunk (in bytes) used to parse the file when creating the byte offset index. (Accepted only for IndexedMS1.)

Returns:
out : MS1Base

An instance of MS1 or IndexedMS1, depending on use_index and source.

pyteomics.ms1.read_header(source, *args, **kwargs)[source]

Read the specified MS1 file, get the parameters specified in the header as a dict.

Parameters:
source : str or file

File name or file object representing an file in MS1 format.

Returns:
header : dict

Contents