peff - PSI Extended FASTA Format¶
PEFF is a forth-coming standard from PSI-HUPO formalizing and extending the encoding of protein features and annotations for building search spaces for proteomics. See The PEFF specification for more up-to-date information on the standard.
Data manipulation¶
Classes¶
The PEFF parser inherits several properties from implementation in the fasta module,
building on top of the TwoLayerIndexedFASTA reader.
Available classes:
IndexedPEFF- Parse a PEFF format file in binary-mode, supporting direct indexing by header string or by tag.
- class pyteomics.peff.Header(mapping, original=None)[source]¶
Bases:
MappingHold parsed properties of a key-value pair like a sequence’s definition line.
This object supports the
Mappinginterface, and keys may be accessed by attribute access notation.- get(k[, d]) D[k] if k in D, else d. d defaults to None.¶
- class pyteomics.peff.IndexedPEFF(source, ignore_comments=False, **kwargs)[source]¶
Bases:
TwoLayerIndexedFASTACreates an
IndexedPEFFobject.- Parameters:
- __init__(source, ignore_comments=False, **kwargs)[source]¶
Open source and create a two-layer index for convenient random access both by full header strings and extracted fields.
- Parameters:
source (str or file-like) – File to read. If file object, it must be opened in binary mode.
header_pattern (str or RE or None, optional) – Pattern to match the header string. Must capture the group used for the second index. If
None(default), second-level index is not created.header_group (int or str or None, optional) – Defines which group is used as key in the second-level index. Default is 1.
ignore_comments (bool, optional) – If
Truethen ignore the second and subsequent lines of description. Default isFalse, which concatenates multi-line descriptions into a single string.parser (function or None, optional) – Defines whether the FASTA descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The
std_parsersdict has parsers for several formats. Hint: specifyparse()as the parser to apply automatic format recognition. Default isNone, which means return the header “as is”.arguments (Other)
- build_second_index()¶
Create the mapping from extracted field to whole header string.
- get_by_id(key)¶
Get the entry by value of header string or extracted field.
- map(target=None, workers=None, args=None, kwargs=None, method='mp', **_kwargs)¶
Execute the
targetfunction over entries of this object in parallel. The type of parallelism is determined by themethodparameter.Results will be returned out of order.
- Parameters:
target (
Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values inargsandkwargs.workers (int, optional) – The number of worker threads or processes to use. The default depends on the
methodparameter.args (
Sequence, optional) – Additional positional arguments to be passed to the target function.kwargs (
Mapping, optional) – Additional keyword arguments to be passed to the target function.method (str, optional) –
The type of parallelism to use. Can be one of the following:
**_kwargs – Additional keyword arguments to be passed to the target function.
- Yields:
object – The work item returned by the target function.
- pmap(target=None, workers=None, args=None, kwargs=None, **_kwargs)¶
Execute the
targetfunction over entries of this object across up toworkersprocesses.Results will be returned out of order.
- Parameters:
target (
Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values inargsandkwargs.workers (int or None, optional) – The number of worker processes to use. If not a positive integer, defaults to the number of available CPUs. This parameter can also be set at reader creation.
args (
Sequence, optional) – Additional positional arguments to be passed to the target function.kwargs (
Mapping, optional) – Additional keyword arguments to be passed to the target function.**_kwargs – Additional keyword arguments to be passed to the target function.
- Yields:
object – The work item returned by the target function.
- reset()¶
Resets the iterator to its initial state.
- tmap(target=None, workers=None, args=None, kwargs=None, chunk_size=None, **_kwargs)¶
Execute the
targetfunction over entries of this object across up toworkersthreads.Results will be returned out of order.
- Parameters:
target (
Callable, optional) –The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in
argsandkwargs.Warning
target must be thread-safe. The target function cannot interact with the underlying file object directly.
workers (int or None, optional) – The number of worker threads to use. If not a positive integer, defaults to the number of available CPUs.
args (
Sequence, optional) – Additional positional arguments to be passed to the target function.kwargs (
Mapping, optional) – Additional keyword arguments to be passed to the target function.chunk_size (int, optional) – The number of work items to hand out to each worker thread at a time. If not specified, defaults to
chunk_sizeattribute of this object.**_kwargs – Additional keyword arguments to be passed to the target function.
- Yields:
object – The work item returned by the target function.