Pyteomics documentation v4.1.3dev0

peff - PSI Extended FASTA Format

Contents

peff - PSI Extended FASTA Format

PEFF is a forth-coming standard from PSI-HUPO formalizing and extending the encoding of protein features and annotations for building search spaces for proteomics. See The PEFF specification for more up-to-date information on the standard.

Data manipulation

Classes

The PEFF parser inherits several properties from implementation in the fasta module, building on top of the TwoLayerIndexedFASTA reader.

Available classes:

IndexedPEFF - Parse a PEFF format file in binary-mode, supporting direct indexing by header string or by tag.
class pyteomics.peff.Header(mapping, original=None)[source]

Bases: _abcoll.Mapping

Hold parsed properties of a key-value pair like a sequence’s definition line.

This object supports the Mapping interface, and keys may be accessed by attribute access notation.

Methods

get(self, key[, default])
iteritems(self)
iterkeys(self)
itervalues(self)
items  
keys  
values  
__init__(self, mapping, original=None)[source]

x.__init__(…) initializes x; see help(type(x)) for signature

get(self, key, default=None)
items(self)[source]
iteritems(self)
iterkeys(self)
itervalues(self)
keys(self)[source]
values(self)[source]
class pyteomics.peff.IndexedPEFF(source, ignore_comments=False, **kwargs)[source]

Bases: pyteomics.fasta.TwoLayerIndexedFASTA

Creates an IndexedPEFF object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in rb mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the TwoLayerIndexedFASTA constructor.
Attributes:
default_index
index

Methods

build_second_index(self) Create the mapping from extracted field to whole header string.
get_by_id(self, key) Get the entry by value of header string or extracted field.
map(self[, target, processes, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
parser  
reset  
__init__(self, source, ignore_comments=False, **kwargs)[source]

Open source and create a two-layer index for convenient random access both by full header strings and extracted fields.

Parameters:
source : str or file-like

File to read. If file object, it must be opened in binary mode.

header_pattern : str or RE or None, optional

Pattern to match the header string. Must capture the group used for the second index. If None (default), second-level index is not created.

header_group : int or str or None, optional

Defines which group is used as key in the second-level index. Default is 1.

ignore_comments : bool, optional

If True then ignore the second and subsequent lines of description. Default is False, which concatenates multi-line descriptions into a single string.

parser : function or None, optional

Defines whether the FASTA descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format recognition. Default is None, which means return the header “as is”.

Other arguments : the same as for IndexedFASTA.
build_second_index(self)

Create the mapping from extracted field to whole header string.

get_by_id(self, key)

Get the entry by value of header string or extracted field.

map(self, target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

Contents