Pyteomics documentation v4.1.3dev0

fasta - manipulations with FASTA databases

«  electrochem - electrochemical properties of polypeptides   ::   Contents

fasta - manipulations with FASTA databases

FASTA is a simple file format for protein sequence databases. Please refer to the NCBI website for the most detailed information on the format.

Data manipulation

Classes

Several classes of FASTA parsers are available. All of them have common features:

  • context manager support;
  • header parsing;
  • direct iteration.

Available classes:

FASTABase - common ancestor, suitable for type checking. Abstract class.

FASTA - text-mode, sequential parser. Good for iteration over database entries.

IndexedFASTA - binary-mode, indexing parser. Supports direct indexing by header string.

TwoLayerIndexedFASTA - additionally supports indexing by extracted header fields.

UniProt and IndexedUniProt, UniParc and IndexedUniParc, UniMes and IndexedUniMes, UniRef and IndexedUniRef, SPD and IndexedSPD, NCBI and IndexedNCBI - format-specific parsers.

Functions

read() - returns an instance of the appropriate reader class, for sequential iteration or random access.

chain() - read multiple files at once.

chain.from_iterable() - read multiple files at once, using an iterable of files.

write() - write entries to a FASTA database.

parse() - parse a FASTA header.

Decoy sequence generation

decoy_sequence() - generate a decoy sequence from a given sequence, using one of the other functions listed in this section or any other callable.

reverse() - generate a reversed decoy sequence.

shuffle() - generate a shuffled decoy sequence.

fused_decoy() - generate a “fused” decoy sequence.

Decoy database generation

write_decoy_db() - generate a decoy database and write it to a file.

decoy_db() - generate entries for a decoy database from a given FASTA database.

decoy_chain() - a version of decoy_db() for multiple files.

decoy_chain.from_iterable() - like decoy_chain(), but with an iterable of files.

Auxiliary

std_parsers - a dictionary with parsers for known FASTA header formats.

pyteomics.fasta.chain(*args, **kwargs)

Chain read() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the read() function.

chain.from_iterable(files, **kwargs)

Chain read() for several files. Keyword arguments are passed to the read() function.

files : iterable
Iterable of file names or file objects.
pyteomics.fasta.decoy_chain(*args, **kwargs)

Chain decoy_db() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the decoy_db() function.

decoy_chain.from_iterable(files, **kwargs)

Chain decoy_db() for several files. Keyword arguments are passed to the decoy_db() function.

files : iterable
Iterable of file names or file objects.
class pyteomics.fasta.FASTA(source, ignore_comments=False, parser=None, encoding=None)[source]

Bases: pyteomics.auxiliary.file_helpers.FileReader, pyteomics.fasta.FASTABase

Text-mode, sequential FASTA parser. Suitable for iteration over the file to obtain all entries in order.

Attributes:
parser

Methods

get_entry  
next  
reset  
__init__(self, source, ignore_comments=False, parser=None, encoding=None)[source]

Create a new FASTA parser object. Supports iteration, yields (description, sequence) tuples. Supports with syntax.

Parameters:
source : str or file-like

File to read. If file object, it must be opened in text mode.

ignore_comments : bool, optional

If True then ignore the second and subsequent lines of description. Default is False, which concatenates multi-line descriptions into a single string.

parser : function or None, optional

Defines whether the FASTA descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format recognition. Default is None, which means return the header “as is”.

encoding : str or None, optional

File encoding (if it is given by name).

reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.FASTABase(ignore_comments=False, parser=None)[source]

Abstract base class for FASTA file parsers. Can be used for type checking.

Attributes:
parser

Methods

get_entry  
__init__(self, ignore_comments=False, parser=None)[source]
class pyteomics.fasta.FlavoredMixin(parse=True)[source]

Parser aimed at a specific FASTA flavor. Subclasses should define parser and header_pattern. The parse argument in __init__() defines whether description is parsed in output.

__init__(self, parse=True)[source]
class pyteomics.fasta.IndexedFASTA(source, ignore_comments=False, parser=None, **kwargs)[source]

Bases: pyteomics.auxiliary.file_helpers.TaskMappingMixin, pyteomics.auxiliary.file_helpers.IndexedTextReader, pyteomics.fasta.FASTABase

Indexed FASTA parser. Supports direct indexing by matched labels.

Attributes:
default_index
index
parser

Methods

map(self[, target, processes, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_id  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
reset  
__init__(self, source, ignore_comments=False, parser=None, **kwargs)[source]

Create an indexed FASTA parser object.

Parameters:
source : str or file-like

File to read. If file object, it must be opened in binary mode.

ignore_comments : bool, optional

If True then ignore the second and subsequent lines of description. Default is False, which concatenates multi-line descriptions into a single string.

parser : function or None, optional

Defines whether the FASTA descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format recognition. Default is None, which means return the header “as is”.

encoding : str or None, optional, keyword only

File encoding. Default is UTF-8.

block_size : int or None, optional, keyword only

Number of bytes to consume at once.

delimiter : str or None, optional, keyword only

Overrides the FASTA record delimiter (default is ``

>’``).
label : str or None, optional, keyword only

Overrides the FASTA record label pattern. Default is ``’^[

]?>(.*)’``.
label_group : int or str, optional, keyword only

Overrides the matched group used as key in the byte offset index. This in combination with label can be used to extract fields from headers. However, consider using TwoLayerIndexedFASTA for this purpose.

map(self, target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.IndexedNCBI(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.NCBIMixin, pyteomics.fasta.TwoLayerIndexedFASTA

Indexed parser for NCBI FASTA files.

Attributes:
default_index
index

Methods

build_second_index(self) Create the mapping from extracted field to whole header string.
get_by_id(self, key) Get the entry by value of header string or extracted field.
map(self[, target, processes, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a IndexedNCBI object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in binary mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the TwoLayerIndexedFASTA constructor.
build_second_index(self)

Create the mapping from extracted field to whole header string.

get_by_id(self, key)

Get the entry by value of header string or extracted field.

map(self, target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.IndexedRefSeq(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.RefSeqMixin, pyteomics.fasta.TwoLayerIndexedFASTA

Indexed parser for RefSeq FASTA files.

Attributes:
default_index
index

Methods

build_second_index(self) Create the mapping from extracted field to whole header string.
get_by_id(self, key) Get the entry by value of header string or extracted field.
map(self[, target, processes, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a IndexedRefSeq object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in binary mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the TwoLayerIndexedFASTA constructor.
build_second_index(self)

Create the mapping from extracted field to whole header string.

get_by_id(self, key)

Get the entry by value of header string or extracted field.

map(self, target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.IndexedSPD(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.SPDMixin, pyteomics.fasta.TwoLayerIndexedFASTA

Indexed parser for SPD FASTA files.

Attributes:
default_index
index

Methods

build_second_index(self) Create the mapping from extracted field to whole header string.
get_by_id(self, key) Get the entry by value of header string or extracted field.
map(self[, target, processes, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a IndexedSPD object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in binary mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the TwoLayerIndexedFASTA constructor.
build_second_index(self)

Create the mapping from extracted field to whole header string.

get_by_id(self, key)

Get the entry by value of header string or extracted field.

map(self, target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.IndexedUniMes(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.UniMesMixin, pyteomics.fasta.TwoLayerIndexedFASTA

Indexed parser for UniMes FASTA files.

Attributes:
default_index
index

Methods

build_second_index(self) Create the mapping from extracted field to whole header string.
get_by_id(self, key) Get the entry by value of header string or extracted field.
map(self[, target, processes, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a IndexedUniMes object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in binary mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the TwoLayerIndexedFASTA constructor.
build_second_index(self)

Create the mapping from extracted field to whole header string.

get_by_id(self, key)

Get the entry by value of header string or extracted field.

map(self, target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.IndexedUniParc(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.UniParcMixin, pyteomics.fasta.TwoLayerIndexedFASTA

Indexed parser for UniParc FASTA files.

Attributes:
default_index
index

Methods

build_second_index(self) Create the mapping from extracted field to whole header string.
get_by_id(self, key) Get the entry by value of header string or extracted field.
map(self[, target, processes, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a IndexedUniParc object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in binary mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the TwoLayerIndexedFASTA constructor.
build_second_index(self)

Create the mapping from extracted field to whole header string.

get_by_id(self, key)

Get the entry by value of header string or extracted field.

map(self, target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.IndexedUniProt(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.UniProtMixin, pyteomics.fasta.TwoLayerIndexedFASTA

Indexed parser for UniProt FASTA files.

Attributes:
default_index
index

Methods

build_second_index(self) Create the mapping from extracted field to whole header string.
get_by_id(self, key) Get the entry by value of header string or extracted field.
map(self[, target, processes, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a IndexedUniProt object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in binary mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the TwoLayerIndexedFASTA constructor.
build_second_index(self)

Create the mapping from extracted field to whole header string.

get_by_id(self, key)

Get the entry by value of header string or extracted field.

map(self, target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.IndexedUniRef(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.UniRefMixin, pyteomics.fasta.TwoLayerIndexedFASTA

Indexed parser for UniRef FASTA files.

Attributes:
default_index
index

Methods

build_second_index(self) Create the mapping from extracted field to whole header string.
get_by_id(self, key) Get the entry by value of header string or extracted field.
map(self[, target, processes, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a IndexedUniRef object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in binary mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the TwoLayerIndexedFASTA constructor.
build_second_index(self)

Create the mapping from extracted field to whole header string.

get_by_id(self, key)

Get the entry by value of header string or extracted field.

map(self, target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.NCBI(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.NCBIMixin, pyteomics.fasta.FASTA

Text-mode parser for NCBI FASTA files.

Methods

get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a NCBI object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in text mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the FASTA constructor.
reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.RefSeq(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.RefSeqMixin, pyteomics.fasta.FASTA

Text-mode parser for RefSeq FASTA files.

Methods

get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a RefSeq object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in text mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the FASTA constructor.
reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.SPD(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.SPDMixin, pyteomics.fasta.FASTA

Text-mode parser for SPD FASTA files.

Methods

get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a SPD object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in text mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the FASTA constructor.
reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.TwoLayerIndexedFASTA(source, header_pattern=None, header_group=None, ignore_comments=False, parser=None, **kwargs)[source]

Bases: pyteomics.fasta.IndexedFASTA

Parser with two-layer index. Extracted groups are mapped to full headers (where possible), full headers are mapped to byte offsets.

When indexed, the key is looked up in both indexes, allowing access by meaningful IDs (like UniProt accession) and by full header string.

Attributes:
default_index
header_pattern
index
parser

Methods

build_second_index(self) Create the mapping from extracted field to whole header string.
get_by_id(self, key) Get the entry by value of header string or extracted field.
map(self[, target, processes, …]) Execute the target function over entries of this object across up to processes processes.
build_byte_index  
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
get_entry  
next  
reset  
__init__(self, source, header_pattern=None, header_group=None, ignore_comments=False, parser=None, **kwargs)[source]

Open source and create a two-layer index for convenient random access both by full header strings and extracted fields.

Parameters:
source : str or file-like

File to read. If file object, it must be opened in binary mode.

header_pattern : str or RE or None, optional

Pattern to match the header string. Must capture the group used for the second index. If None (default), second-level index is not created.

header_group : int or str or None, optional

Defines which group is used as key in the second-level index. Default is 1.

ignore_comments : bool, optional

If True then ignore the second and subsequent lines of description. Default is False, which concatenates multi-line descriptions into a single string.

parser : function or None, optional

Defines whether the FASTA descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format recognition. Default is None, which means return the header “as is”.

Other arguments : the same as for IndexedFASTA.
build_second_index(self)[source]

Create the mapping from extracted field to whole header string.

get_by_id(self, key)[source]

Get the entry by value of header string or extracted field.

map(self, target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.UniMes(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.UniMesMixin, pyteomics.fasta.FASTA

Text-mode parser for UniMes FASTA files.

Methods

get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a UniMes object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in text mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the FASTA constructor.
reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.UniParc(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.UniParcMixin, pyteomics.fasta.FASTA

Text-mode parser for UniParc FASTA files.

Methods

get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a UniParc object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in text mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the FASTA constructor.
reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.UniProt(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.UniProtMixin, pyteomics.fasta.FASTA

Text-mode parser for UniProt FASTA files.

Methods

get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a UniProt object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in text mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the FASTA constructor.
reset(self)

Resets the iterator to its initial state.

class pyteomics.fasta.UniRef(source, parse=True, **kwargs)[source]

Bases: pyteomics.fasta.UniRefMixin, pyteomics.fasta.FASTA

Text-mode parser for UniRef FASTA files.

Methods

get_entry  
next  
parser  
reset  
__init__(self, source, parse=True, **kwargs)

Creates a UniRef object.

Parameters:
source : str or file

The file to read. If a file object, it needs to be in text mode.

parse : bool, optional

Defines whether the descriptions should be parsed in the produced tuples. Default is True.

kwargs : passed to the FASTA constructor.
reset(self)

Resets the iterator to its initial state.

pyteomics.fasta.decoy_db(*args, **kwargs)[source]

Iterate over sequences for a decoy database out of a given source.

Parameters:
source : file-like object or str or None, optional

A path to a FASTA database or a file object itself. Default is None, which means read standard input.

mode : str or callable, optional

Algorithm of decoy sequence generation. ‘reverse’ by default. See decoy_sequence() for more information.

prefix : str, optional

A prefix to the protein descriptions of decoy entries. The default value is ‘DECOY_’.

decoy_only : bool, optional

If set to True, only the decoy entries will be written to output. If False, the entries from source will be written first. False by default.

ignore_comments : bool, optional

If True then ignore the second and subsequent lines of description. Default is False.

parser : function or None, optional

Defines whether the fasta descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format guessing. Default is None, which means return the header “as is”.

**kwargs : given to decoy_sequence().
Returns:
out : iterator

An iterator over entries of the new database.

pyteomics.fasta.decoy_sequence(sequence, mode='reverse', **kwargs)[source]

Create a decoy sequence out of a given sequence string.

Parameters:
sequence : str

The initial sequence string.

mode : str or callable, optional

Type of decoy sequence. Should be one of the standard modes or any callable. Standard modes are:

Default is ‘reverse’.

**kwargs : given to the decoy function.
Returns:
decoy_sequence : str

The decoy sequence.

pyteomics.fasta.fused_decoy(sequence, decoy_mode='reverse', sep='R', **kwargs)[source]

Create a “fused” decoy sequence by concatenating a decoy sequence with the original one. The method and its use cases are described in:

Ivanov, M. V., Levitsky, L. I., & Gorshkov, M. V. (2016). Adaptation of Decoy Fusion Strategy for Existing Multi-Stage Search Workflows. Journal of The American Society for Mass Spectrometry, 27(9), 1579-1582.

Parameters:
sequence : str

The initial sequence string.

decoy_mode : str or callable, optional

Type of decoy sequence to use. Should be one of the standard modes or any callable. Standard modes are:

Default is ‘reverse’.

sep : str, optional

Amino acid motif that separates the decoy sequence from the target one. This setting should reflect the enzyme specificity used in the search against the database being generated. Default is ‘R’, which is suitable for trypsin searches.

**kwargs : given to the decoy generation function.

Examples

>>> fused_decoy('PEPT')
'TPEPRPEPT'
>>> fused_decoy('MPEPT', 'shuffle', 'K', keep_nterm=True)
'MPPTEKMPEPT'
pyteomics.fasta.parse(header, flavor='auto', parsers=None)[source]

Parse the FASTA header and return a nice dictionary.

Parameters:
header : str

FASTA header to parse

flavor : str, optional

Short name of the header format (case-insensitive). Valid values are 'auto' and keys of the parsers dict. Default is 'auto', which means try all formats in turn and return the first result that can be obtained without an exception.

parsers : dict, optional

A dict where keys are format names (lowercased) and values are functions that take a header string and return the parsed header.

Returns:
out : dict

A dictionary with the info from the header. The format depends on the flavor.

pyteomics.fasta.read(source=None, use_index=None, flavor=None, **kwargs)[source]

Parse a FASTA file. This function serves as a dispatcher between different parsers available in this module.

Parameters:
source : str or file or None, optional

A file object (or file name) with a FASTA database. Default is None, which means read standard input.

use_index : bool, optional

If True, the created parser object will be an instance of IndexedFASTA. If False (default), it will be an instance of FASTA.

flavor : str or None, optional

A supported FASTA header format. If specified, a format-specific parser instance is returned.

Note

See std_parsers for supported flavors.

Returns:
out : iterator of tuples

A named 2-tuple with FASTA header (str or dict) and sequence (str). Attributes ‘description’ and ‘sequence’ are also provided.

pyteomics.fasta.reverse(sequence, keep_nterm=False, keep_cterm=False)[source]

Create a decoy sequence by reversing the original one.

Parameters:
sequence : str

The initial sequence string.

keep_nterm : bool, optional

If True, then the N-terminal residue will be kept. Default is False.

keep_cterm : bool, optional

If True, then the C-terminal residue will be kept. Default is False.

Returns:
decoy_sequence : str

The decoy sequence.

pyteomics.fasta.shuffle(sequence, keep_nterm=False, keep_cterm=False)[source]

Create a decoy sequence by shuffling the original one.

Parameters:
sequence : str

The initial sequence string.

keep_nterm : bool, optional

If True, then the N-terminal residue will be kept. Default is False.

keep_cterm : bool, optional

If True, then the C-terminal residue will be kept. Default is False.

Returns:
decoy_sequence : str

The decoy sequence.

pyteomics.fasta.std_parsers

A dictionary with parsers for known FASTA header formats. For now, supported formats are those described at UniProt help page.

pyteomics.fasta.write(*args, **kwargs)[source]

Create a FASTA file with entries.

Parameters:
entries : iterable of (str, str) tuples

An iterable of 2-tuples in the form (description, sequence).

output : file-like or str, optional

A file open for writing or a path to write to. If the file exists, it will be opened for appending. Default is None, which means write to standard output.

file_mode : str, keyword only, optional

If output is a file name, defines the mode the file will be opened in. Otherwise will be ignored. Default is ‘a’.

Returns:
output_file : file object

The file where the FASTA is written.

pyteomics.fasta.write_decoy_db(*args, **kwargs)[source]

Generate a decoy database out of a given source and write to file.

If output is a path, the file will be open for appending, so no information will be lost if the file exists. Although, the user should be careful when providing open file streams as source and output. The reading and writing will start from the current position in the files, which is where the last I/O operation finished. One can use the file.seek() method to change it.

Parameters:
source : file-like object or str or None, optional

A path to a FASTA database or a file object itself. Default is None, which means read standard input.

output : file-like object or str, optional

A path to the output database or a file open for writing. Defaults to None, the results go to the standard output.

mode : str or callable, optional

Algorithm of decoy sequence generation. ‘reverse’ by default. See decoy_sequence() for more details.

prefix : str, optional

A prefix to the protein descriptions of decoy entries. The default value is ‘DECOY_’

decoy_only : bool, optional

If set to True, only the decoy entries will be written to output. If False, the entries from source will be written as well. False by default.

file_mode : str, keyword only, optional

If output is a file name, defines the mode the file will be opened in. Otherwise will be ignored. Default is ‘a’.

**kwargs : given to decoy_sequence().
Returns:
output : file

A (closed) file object for the created file.

«  electrochem - electrochemical properties of polypeptides   ::   Contents