traml - targeted MS transition data in TraML format¶
Summary¶
TraML is a standard rich XML-format for targeted mass spectrometry method definitions. Please refer to psidev.info for the detailed specification of the format and structure of TraML files.
This module provides a minimalistic way to extract information from TraML
files. You can use the object-oriented interface (TraML instances) to
access target definitions and transitions. TraML objects also support
indexing with entity IDs directly.
Data access¶
TraML- a class representing a single TraML file. Other data access functions use this class internally.
read()- iterate through transitions in TraML format.
chain()- read multiple TraML files at once.
chain.from_iterable()- read multiple files at once, using an iterable of files.
Controlled Vocabularies and Caching¶
TraML relies on controlled vocabularies to describe its contents extensibly.
Every TraML needs a copy of PSI-MS CV, which it handles using the psims library.
If you want to save time when creating instances of TraML, consider enabling the psims cache.
See psims documentation
on how to enable and configure the cache (alternatively, you can handle CV creation yourself and pass a pre-created instance
using the cv parameter to TraML).
See also
Controlled Vocabulary Terms
for more details on how they are used.
Handling Time Units and Other Qualified Quantities¶
TraML contains information which may be described as using a variety of different time units. See Unit Handling for more information.
Deprecated functions¶
version_info()- get version information about the TraML file. You can just read the corresponding attribute of theTraMLobject.
iterfind()- iterate over elements in an TraML file. You can just call the corresponding method of theTraMLobject.
Dependencies¶
This module requires lxml
- pyteomics.traml.chain(*sources, **kwargs)¶
Chain
TraMLfor several sources into a single iterable. Positional arguments should be sources like file names or file objects. Keyword arguments are passed to theTraMLfunction.- Parameters:
sources (
Iterable) – Sources for creating new sequences from, such as paths or file-like objectskwargs (
Mapping) – Additional arguments used to instantiate each sequence
- class pyteomics.traml.TraML(*args, **kwargs)[source]¶
Bases:
CVParamParser,MultiProcessingXML,IndexSavingXMLParser class for TraML files.
- __init__(*args, **kwargs)[source]¶
Create an indexed XML parser object.
- Parameters:
source (str or file) – File name or file-like object corresponding to an XML file.
read_schema (bool, optional) – Defines whether schema file referenced in the file header should be used to extract information about value conversion. Default is
False.iterative (bool, optional) – Defines whether an
ElementTreeobject should be constructed and stored on the instance or if iterative parsing should be used instead. Iterative parsing keeps the memory usage low for large XML files. Default isTrue.use_index (bool, optional) – Defines whether an index of byte offsets needs to be created for elements listed in indexed_tags. This is useful for random access to spectra in mzML or elements of mzIdentML files, or for iterative parsing of mzIdentML with
retrieve_refs=True. IfTrue, build_id_cache is ignored. IfFalse, the object acts exactly likeXML. Default isTrue.indexed_tags (container of bytes, optional) – If use_index is
True, elements listed in this parameter will be indexed. Empty set by default.
- build_byte_index()¶
Build the byte offset index by either reading these offsets from the file at
_byte_offset_filename, or falling back to the method used byIndexedXMLorIndexedTextReaderif this operation fails due to an IOError
- build_id_cache()¶
Construct a cache for each element in the document, indexed by id attribute
- build_tree()¶
Build and store the
ElementTreeinstance for the underlying file
- clear_id_cache()¶
Clear the element ID cache
- clear_tree()¶
Remove the saved
ElementTree.
- get_by_id(elem_id, id_key=None, element_type=None, **kwargs)¶
Retrieve the requested entity by its id. If the entity is a spectrum described in the offset index, it will be retrieved by immediately seeking to the starting position of the entry, otherwise falling back to parsing from the start of the file.
- iterfind(path, **kwargs)¶
Parse the XML and yield info on elements with specified local name or by specified “XPath”.
- Parameters:
path (str) – Element name or XPath-like expression. The path is very close to full XPath syntax, but local names should be used for all elements in the path. They will be substituted with local-name() checks, up to the (first) predicate. The path can be absolute or “free”. Please don’t specify namespaces.
**kwargs (passed to
self._get_info_smart().)
- Returns:
out
- Return type:
iterator
- map(target=None, workers=None, args=None, kwargs=None, method='mp', **_kwargs)¶
Execute the
targetfunction over entries of this object in parallel. The type of parallelism is determined by themethodparameter.Results will be returned out of order.
- Parameters:
target (
Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values inargsandkwargs.workers (int, optional) – The number of worker threads or processes to use. The default depends on the
methodparameter.args (
Sequence, optional) – Additional positional arguments to be passed to the target function.kwargs (
Mapping, optional) – Additional keyword arguments to be passed to the target function.method (str, optional) –
The type of parallelism to use. Can be one of the following:
**_kwargs – Additional keyword arguments to be passed to the target function.
- Yields:
object – The work item returned by the target function.
- pmap(target=None, workers=None, args=None, kwargs=None, **_kwargs)¶
Execute the
targetfunction over entries of this object across up toworkersprocesses.Results will be returned out of order.
- Parameters:
target (
Callable, optional) – The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values inargsandkwargs.workers (int or None, optional) – The number of worker processes to use. If not a positive integer, defaults to the number of available CPUs. This parameter can also be set at reader creation.
args (
Sequence, optional) – Additional positional arguments to be passed to the target function.kwargs (
Mapping, optional) – Additional keyword arguments to be passed to the target function.**_kwargs – Additional keyword arguments to be passed to the target function.
- Yields:
object – The work item returned by the target function.
- classmethod prebuild_byte_offset_file(path)¶
Construct a new XML reader, build its byte offset index and write it to file
- Parameters:
path (str) – The path to the file to parse
- reset()¶
Resets the iterator to its initial state.
- tmap(target=None, workers=None, args=None, kwargs=None, chunk_size=None, **_kwargs)¶
Execute the
targetfunction over entries of this object across up toworkersthreads.Results will be returned out of order.
- Parameters:
target (
Callable, optional) –The function to execute over each entry. It will be given a single object yielded by the wrapped iterator as well as all of the values in
argsandkwargs.Warning
target must be thread-safe. The target function cannot interact with the underlying file object directly.
workers (int or None, optional) – The number of worker threads to use. If not a positive integer, defaults to the number of available CPUs.
args (
Sequence, optional) – Additional positional arguments to be passed to the target function.kwargs (
Mapping, optional) – Additional keyword arguments to be passed to the target function.chunk_size (int, optional) – The number of work items to hand out to each worker thread at a time. If not specified, defaults to
chunk_sizeattribute of this object.**_kwargs – Additional keyword arguments to be passed to the target function.
- Yields:
object – The work item returned by the target function.
- write_byte_offsets()¶
Write the byte offsets in
_offset_indexto the file at_byte_offset_filename
- pyteomics.traml.iterfind(source, path, **kwargs)[source]¶
Parse source and yield info on elements with specified local name or by specified “XPath”.
Note
This function is provided for backward compatibility only. If you do multiple
iterfind()calls on one file, you should create anTraMLobject and use itsiterfind()method.- Parameters:
path (str) – Element name or XPath-like expression. Only local names separated with slashes are accepted. An asterisk (*) means any element. You can specify a single condition in the end, such as:
"/path/to/element[some_value>1.5]"Note: you can do much more powerful filtering using plain Python. The path can be absolute or “free”. Please don’t specify namespaces.recursive (bool, optional) – If
False, subelements will not be processed when extracting info from elements. Default isTrue.iterative (bool, optional) – Specifies whether iterative XML parsing should be used. Iterative parsing significantly reduces memory usage and may be just a little slower. When retrieve_refs is
True, however, it is highly recommended to disable iterative parsing if possible. Default value isTrue.read_schema (bool, optional) – If
True, attempt to extract information from the XML schema mentioned in the mzIdentML header. Otherwise, use default parameters. Not recommended without Internet connection or if you don’t like to get the related warnings.
- Returns:
out
- Return type:
iterator
- pyteomics.traml.read(source, retrieve_refs=True, read_schema=False, iterative=True, use_index=False, huge_tree=False)[source]¶
Parse source and iterate through transitions.
- Parameters:
source (str or file) – A path to a target TraML file or the file object itself.
retrieve_refs (bool, optional) – If
True, additional information from references will be automatically added to the results. The file processing time will increase. Default isTrue.read_schema (bool, optional) – If
True, attempt to extract information from the XML schema mentioned in the TraML header. Otherwise, use default parameters. Not recommended without Internet connection or if you don’t like to get the related warnings.iterative (bool, optional) – Defines whether iterative parsing should be used. It helps reduce memory usage at almost the same parsing speed. Default is
True.use_index (bool, optional) – Defines whether an index of byte offsets needs to be created for spectrum elements. Default is
False.huge_tree (bool, optional) – This option is passed to the lxml parser and defines whether security checks for XML tree depth and node size should be disabled. Default is
False. Enable this option for trusted files to avoid XMLSyntaxError exceptions (e.g. XMLSyntaxError: xmlSAX2Characters: huge text node).
- Returns:
out – A
TraMLobject, suitable for iteration and possibly random access.- Return type: