Pyteomics documentation v4.7.1

peff - PSI Extended FASTA Format

«  fasta - manipulations with FASTA databases   ::   Contents   ::   mzml - reader for mass spectrometry data in mzML format  »

peff - PSI Extended FASTA Format

PEFF is a forth-coming standard from PSI-HUPO formalizing and extending the encoding of protein features and annotations for building search spaces for proteomics. See The PEFF specification for more up-to-date information on the standard.

Data manipulation

Classes

The PEFF parser inherits several properties from implementation in the fasta module, building on top of the TwoLayerIndexedFASTA reader.

Available classes:

IndexedPEFF - Parse a PEFF format file in binary-mode, supporting direct indexing by header string or by tag.

class pyteomics.peff.Header(mapping, original=None)[source]

Bases: Mapping

Hold parsed properties of a key-value pair like a sequence’s definition line.

This object supports the Mapping interface, and keys may be accessed by attribute access notation.

__init__(mapping, original=None)[source]
get(k[, d]) D[k] if k in D, else d.  d defaults to None.
items() a set-like object providing a view on D's items[source]
keys() a set-like object providing a view on D's keys[source]
values() an object providing a view on D's values[source]
class pyteomics.peff.IndexedPEFF(source, ignore_comments=False, **kwargs)[source]

Bases: TwoLayerIndexedFASTA

Creates an IndexedPEFF object.

Parameters:
  • source (str or file) – The file to read. If a file object, it needs to be in rb mode.

  • parse (bool, optional) – Defines whether the descriptions should be parsed in the produced tuples. Default is True.

  • kwargs (passed to the TwoLayerIndexedFASTA constructor.) –

__init__(source, ignore_comments=False, **kwargs)[source]

Open source and create a two-layer index for convenient random access both by full header strings and extracted fields.

Parameters:
  • source (str or file-like) – File to read. If file object, it must be opened in binary mode.

  • header_pattern (str or RE or None, optional) – Pattern to match the header string. Must capture the group used for the second index. If None (default), second-level index is not created.

  • header_group (int or str or None, optional) – Defines which group is used as key in the second-level index. Default is 1.

  • ignore_comments (bool, optional) – If True then ignore the second and subsequent lines of description. Default is False, which concatenates multi-line descriptions into a single string.

  • parser (function or None, optional) – Defines whether the FASTA descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format recognition. Default is None, which means return the header “as is”.

  • arguments (Other) –

build_second_index()

Create the mapping from extracted field to whole header string.

get_by_id(key)

Get the entry by value of header string or extracted field.

map(target=None, processes=-1, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
  • target (Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

  • processes (int, optional) – The number of worker processes to use. If 0 or negative, defaults to the number of available CPUs. This parameter can also be set at reader creation.

  • args (Sequence, optional) – Additional positional arguments to be passed to the target function

  • kwargs (Mapping, optional) – Additional keyword arguments to be passed to the target function

  • **_kwargs – Additional keyword arguments to be passed to the target function

Yields:

object – The work item returned by the target function.

reset()

Resets the iterator to its initial state.

«  fasta - manipulations with FASTA databases   ::   Contents   ::   mzml - reader for mass spectrometry data in mzML format  »