Pyteomics documentation v4.0.1

traml - reader for targeted mass spectrometry transition data in TraML format

Contents

traml - reader for targeted mass spectrometry transition data in TraML format

Summary

TraML is a standard rich XML-format for targeted mass spectrometry method definitions. Please refer to psidev.info for the detailed specification of the format and structure of TraML files.

This module provides a minimalistic way to extract information from TraML files. You can use the object-oriented interface (TraML instances) to access target definitions and transitions. TraML objects also support indexing with entity IDs directly.

Data access

TraML - a class representing a single TraML file. Other data access functions use this class internally.

read() - iterate through transitions in TraML format.

chain() - read multiple TraML files at once.

chain.from_iterable() - read multiple files at once, using an iterable of files.

Deprecated functions

version_info() - get version information about the TraML file. You can just read the corresponding attribute of the TraML object.

iterfind() - iterate over elements in an TraML file. You can just call the corresponding method of the TraML object.

Dependencies

This module requires lxml


pyteomics.traml.chain(*sources, **kwargs)

Chain sequence_maker() for several sources into a single iterable. Positional arguments should be sources like file names or file objects. Keyword arguments are passed to the sequence_maker() function.

Attributes:
sources : Iterable

Sources for creating new sequences from, such as paths or file-like objects

kwargs : Mapping

Additional arguments used to instantiate each sequence

chain.from_iterable(files, **kwargs)

Chain read() for several files. Keyword arguments are passed to the read() function.

files : iterable
Iterable of file names or file objects.
class pyteomics.traml.TraML(*args, **kwargs)[source]

Bases: pyteomics.xml.MultiProcessingXML, pyteomics.xml.IndexSavingXML

Parser class for TraML files.

Attributes:
default_index
index

Methods

build_id_cache(*args, **kwargs) Construct a cache for each element in the document, indexed by id attribute
build_tree(*args, **kwargs) Build and store the ElementTree instance for the underlying file
clear_id_cache() Clear the element ID cache
clear_tree() Remove the saved ElementTree.
get_by_id(*args, **kwargs) Retrieve the requested entity by its id.
iterfind(path, **kwargs) Parse the XML and yield info on elements with specified local name or by specified “XPath”.
map([target, processes, queue_timeout, …]) Execute the target function over entries of this object across up to processes processes.
prebuild_byte_offset_file(path) Construct a new XML reader, build its byte offset index and write it to file
write_byte_offsets() Write the byte offsets in _offset_index to the file at _byte_offset_filename
get_by_ids  
get_by_index  
get_by_index_slice  
get_by_indexes  
get_by_key_slice  
next  
reset  
__init__(*args, **kwargs)[source]

Create an XML parser object.

Parameters:
source : str or file

File name or file-like object corresponding to an XML file.

read_schema : bool, optional

Defines whether schema file referenced in the file header should be used to extract information about value conversion. Default is False.

iterative : bool, optional

Defines whether an ElementTree object should be constructed and stored on the instance or if iterative parsing should be used instead. Iterative parsing keeps the memory usage low for large XML files. Default is True.

use_index : bool, optional

Defines whether an index of byte offsets needs to be created for elements listed in indexed_tags. This is useful for random access to spectra in mzML or elements of mzIdentML files, or for iterative parsing of mzIdentML with retrieve_refs=True. If True, build_id_cache is ignored. If False, the object acts exactly like XML. Default is True.

indexed_tags : container of bytes, optional

If use_index is True, elements listed in this parameter will be indexed. Empty set by default.

build_id_cache(*args, **kwargs)

Construct a cache for each element in the document, indexed by id attribute

build_tree(*args, **kwargs)

Build and store the ElementTree instance for the underlying file

clear_id_cache()

Clear the element ID cache

clear_tree()

Remove the saved ElementTree.

get_by_id(*args, **kwargs)

Retrieve the requested entity by its id. If the entity is a spectrum described in the offset index, it will be retrieved by immediately seeking to the starting position of the entry, otherwise falling back to parsing from the start of the file.

Parameters:
elem_id : str

The id value of the entity to retrieve.

id_key : str, optional

The name of the XML attribute to use for lookup. Defaults to self._default_id_attr.

Returns:
dict
iterfind(path, **kwargs)

Parse the XML and yield info on elements with specified local name or by specified “XPath”.

Parameters:
path : str

Element name or XPath-like expression. Only local names separated with slashes are accepted. An asterisk (*) means any element. You can specify a single condition in the end, such as: "/path/to/element[some_value>1.5]" Note: you can do much more powerful filtering using plain Python. The path can be absolute or “free”. Please don’t specify namespaces.

**kwargs : passed to self._get_info_smart().
Returns:
out : iterator
map(target=None, processes=-1, queue_timeout=4, args=None, kwargs=None, **_kwargs)

Execute the target function over entries of this object across up to processes processes.

Results will be returned out of order.

Parameters:
target : Callable, optional

The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in args and kwargs

processes : int, optional

The number of worker processes to use. If negative, the number of processes will match the number of available CPUs.

queue_timeout : float, optional

The number of seconds to block, waiting for a result before checking to see if all workers are done.

args : Sequence, optional

Additional positional arguments to be passed to the target function

kwargs : Mapping, optional

Additional keyword arguments to be passed to the target function

**_kwargs

Additional keyword arguments to be passed to the target function

Yields:
object

The work item returned by the target function.

classmethod prebuild_byte_offset_file(path)

Construct a new XML reader, build its byte offset index and write it to file

Parameters:
path : str

The path to the file to parse

reset()

Resets the iterator to its initial state.

write_byte_offsets()

Write the byte offsets in _offset_index to the file at _byte_offset_filename

pyteomics.traml.iterfind(source, path, **kwargs)[source]

Parse source and yield info on elements with specified local name or by specified “XPath”.

Note

This function is provided for backward compatibility only. If you do multiple iterfind() calls on one file, you should create an TraML object and use its iterfind() method.

Parameters:
source : str or file

File name or file-like object.

path : str

Element name or XPath-like expression. Only local names separated with slashes are accepted. An asterisk (*) means any element. You can specify a single condition in the end, such as: "/path/to/element[some_value>1.5]" Note: you can do much more powerful filtering using plain Python. The path can be absolute or “free”. Please don’t specify namespaces.

recursive : bool, optional

If False, subelements will not be processed when extracting info from elements. Default is True.

iterative : bool, optional

Specifies whether iterative XML parsing should be used. Iterative parsing significantly reduces memory usage and may be just a little slower. When retrieve_refs is True, however, it is highly recommended to disable iterative parsing if possible. Default value is True.

read_schema : bool, optional

If True, attempt to extract information from the XML schema mentioned in the mzIdentML header. Otherwise, use default parameters. Not recommended without Internet connection or if you don’t like to get the related warnings.

Returns:
out : iterator
pyteomics.traml.read(source, retrieve_refs=True, read_schema=False, iterative=True, use_index=False, huge_tree=False)[source]

Parse source and iterate through transitions.

Parameters:
source : str or file

A path to a target TraML file or the file object itself.

retrieve_refs : bool, optional

If True, additional information from references will be automatically added to the results. The file processing time will increase. Default is True.

read_schema : bool, optional

If True, attempt to extract information from the XML schema mentioned in the TraML header. Otherwise, use default parameters. Not recommended without Internet connection or if you don’t like to get the related warnings.

iterative : bool, optional

Defines whether iterative parsing should be used. It helps reduce memory usage at almost the same parsing speed. Default is True.

use_index : bool, optional

Defines whether an index of byte offsets needs to be created for spectrum elements. Default is False.

huge_tree : bool, optional

This option is passed to the lxml parser and defines whether security checks for XML tree depth and node size should be disabled. Default is False. Enable this option for trusted files to avoid XMLSyntaxError exceptions (e.g. XMLSyntaxError: xmlSAX2Characters: huge text node).

Returns:
out : TraML

A TraML object, suitable for iteration and possibly random access.

Contents