Pyteomics documentation v4.1

peff - PSI Extended FASTA Format

Contents

peff - PSI Extended FASTA Format

PEFF is a forth-coming standard from PSI-HUPO formalizing and extending the encoding of protein features and annotations for building search spaces for proteomics. See The PEFF specification for more up-to-date information on the standard.

Data manipulation

Classes

The PEFF parser inherits several properties from implementation in the fasta module, building on top of the TwoLayerIndexedFASTA reader.

Available classes:

IndexedPEFF - Parse a PEFF format file in binary-mode, supporting direct indexing by header string or by tag.
class pyteomics.peff.Header(mapping, original=None)[source]

Bases: _abcoll.Mapping

Hold parsed properties of a key-value pair like a sequence’s definition line.

This object supports the Mapping interface, and keys may be accessed by attribute access notation.

Methods

get(k[,d])
iteritems()
iterkeys()
itervalues()
items  
keys  
values  
__init__(mapping, original=None)[source]

x.__init__(…) initializes x; see help(type(x)) for signature

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → list of D's (key, value) pairs, as 2-tuples[source]
iteritems() → an iterator over the (key, value) items of D
iterkeys() → an iterator over the keys of D
itervalues() → an iterator over the values of D
keys() → list of D's keys[source]
values() → list of D's values[source]
class pyteomics.peff.IndexedPEFF(source, ignore_comments=False, **kwargs)[source]

Bases: pyteomics.fasta.TwoLayerIndexedFASTA

Creates an IndexedPEFF object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in rb mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the TwoLayerIndexedFASTA constructor.
Attributes:
default_index
index

Methods

build_second_index() Create the mapping from extracted field to whole header string.
get_by_id(key) Get the entry by value of header string or extracted field.
map([target, processes, queue_timeout, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
parser  
reset  
__init__(source, ignore_comments=False, **kwargs)[source]

Open source and create a two-layer index for convenient random access both by full header strings and extracted fields.

Parameters:
source : str or file-like

File to read. If file object, it must be opened in binary mode.

header_pattern : str or RE or None, optional

Pattern to match the header string. Must capture the group used for the second index. If None (default), second-level index is not created.

header_group : int or str or None, optional

Defines which group is used as key in the second-level index. Default is 1.

ignore_comments : bool, optional

If True then ignore the second and subsequent lines of description. Default is False, which concatenates multi-line descriptions into a single string.

parser : function or None, optional

Defines whether the FASTA descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format recognition. Default is None, which means return the header “as is”.

Other arguments : the same as for IndexedFASTA.
build_second_index()

Create the mapping from extracted field to whole header string.

get_by_id(key)

Get the entry by value of header string or extracted field.

map(target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset()

Resets the iterator to its initial state.

Contents