peff - PSI Extended FASTA Format¶

PEFF is a forth-coming standard from PSI-HUPO formalizing and extending the encoding of protein features and annotations for building search spaces for proteomics. See The PEFF specification for more up-to-date information on the standard.

Data manipulation¶

Classes¶

The PEFF parser inherits several properties from implementation in the fasta module, building on top of the TwoLayerIndexedFASTA reader.

Available classes:

IndexedPEFF - Parse a PEFF format file in binary-mode, supporting direct indexing by header string or by tag.

class pyteomics.peff.Header(mapping, original=None)[source]¶

Bases: Mapping

Hold parsed properties of a key-value pair like a sequence’s definition line.

This object supports the Mapping interface, and keys may be accessed by attribute access notation.

__init__(mapping, original=None)[source]¶

get(k[, d]) → D[k] if k in D, else d. d defaults to None.¶

items() → a set-like object providing a view on D's items[source]¶

keys() → a set-like object providing a view on D's keys[source]¶

values() → an object providing a view on D's values[source]¶

class pyteomics.peff.IndexedPEFF(source, ignore_comments=False, **kwargs)[source]¶

Bases: TwoLayerIndexedFASTA

Creates an IndexedPEFF object.

Parameters:

source (str or file) – The file to read. If a file object, it needs to be in rb mode.
parse (bool, optional) – Defines whether the descriptions should be parsed in the produced tuples. Default is True.
kwargs (passed to the TwoLayerIndexedFASTA constructor.)

__init__(source, ignore_comments=False, **kwargs)[source]¶

Open source and create a two-layer index for convenient random access both by full header strings and extracted fields.

Parameters:

source (str or file-like) – File to read. If file object, it must be opened in binary mode.
header_pattern (str or RE or None, optional) – Pattern to match the header string. Must capture the group used for the second index. If None (default), second-level index is not created.
header_group (int or str or None, optional) – Defines which group is used as key in the second-level index. Default is 1.
ignore_comments (bool, optional) – If True then ignore the second and subsequent lines of description. Default is False, which concatenates multi-line descriptions into a single string.
parser (function or None, optional) – Defines whether the FASTA descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format recognition. Default is None, which means return the header “as is”.
arguments (Other)

build_second_index()¶: Create the mapping from extracted field to whole header string.

get_by_id(key)¶: Get the entry by value of header string or extracted field.

map(target=None, processes=-1, args=None, kwargs=None, **_kwargs)¶

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:

target (Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs
processes (int, optional) – The number of worker processes to use. If 0 or negative, defaults to the number of available CPUs. This parameter can also be set at reader creation.
args (Sequence, optional) – Additional positional arguments to be passed to the target function
kwargs (Mapping, optional) – Additional keyword arguments to be passed to the target function
**_kwargs – Additional keyword arguments to be passed to the target function

Yields:

object – The work item returned by the target function.

reset()¶: Resets the iterator to its initial state.

Pyteomics documentation v4.7.2