trafoxml - reader for trafoXML files¶
Summary¶
trafoXML is a format specified in the OpenMS project. It defines a transformation, which is a result of retention time alignment.
This module provides a minimalistic way to extract information from trafoXML
files. You can use the old functional interface (read()
) or the new
object-oriented interface (TrafoXML
)
to iterate over entries in <Pair>
elements.
Data access¶
TrafoXML
- a class representing a single trafoXML file. Other data access functions use this class internally.
read()
- iterate through pairs in a trafoXML file. Data from a single trafo are converted to a human-readable dict.
chain()
- read multiple trafoXML files at once.
chain.from_iterable()
- read multiple files at once, using an iterable of files.
Dependencies¶
This module requres lxml
.
- pyteomics.openms.trafoxml.chain(*args, **kwargs)¶
Chain
read()
for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to theread()
function.
- chain.from_iterable(files, **kwargs)¶
Chain
read()
for several files. Keyword arguments are passed to theread()
function.- Parameters:
files – Iterable of file names or file objects.
- class pyteomics.openms.trafoxml.TrafoXML(source, read_schema=None, iterative=None, build_id_cache=False, **kwargs)[source]¶
Bases:
XML
Parser class for trafoXML files.
- __init__(source, read_schema=None, iterative=None, build_id_cache=False, **kwargs)¶
Create an XML parser object.
- Parameters:
source (str or file) – File name or file-like object corresponding to an XML file.
read_schema (bool, optional) – Defines whether schema file referenced in the file header should be used to extract information about value conversion. Default is
False
.iterative (bool, optional) – Defines whether an
ElementTree
object should be constructed and stored on the instance or if iterative parsing should be used instead. Iterative parsing keeps the memory usage low for large XML files. Default isTrue
.build_id_cache (bool, optional) – Defines whether a dictionary mapping IDs to XML tree elements should be built and stored on the instance. It is used in
XML.get_by_id()
, e.g. when usingpyteomics.mzid.MzIdentML
withretrieve_refs=True
.huge_tree (bool, optional) – This option is passed to the lxml parser and defines whether security checks for XML tree depth and node size should be disabled. Default is
False
. Enable this option for trusted files to avoid XMLSyntaxError exceptions (e.g. XMLSyntaxError: xmlSAX2Characters: huge text node).
- build_id_cache()¶
Construct a cache for each element in the document, indexed by id attribute
- build_tree()¶
Build and store the
ElementTree
instance for the underlying file
- clear_id_cache()¶
Clear the element ID cache
- clear_tree()¶
Remove the saved
ElementTree
.
- get_by_id(elem_id, **kwargs)¶
Parse the file and return the element with id attribute equal to elem_id. Returns
None
if no such element is found.
- iterfind(path, **kwargs)¶
Parse the XML and yield info on elements with specified local name or by specified “XPath”.
- Parameters:
path (str) – Element name or XPath-like expression. The path is very close to full XPath syntax, but local names should be used for all elements in the path. They will be substituted with local-name() checks, up to the (first) predicate. The path can be absolute or “free”. Please don’t specify namespaces.
**kwargs (passed to
self._get_info_smart()
.)
- Returns:
out
- Return type:
iterator
- reset()¶
Resets the iterator to its initial state.
- pyteomics.openms.trafoxml.read(source, read_schema=True, iterative=True)[source]¶
Parse source and iterate through pairs.
- Parameters:
source (str or file) – A path to a target trafoXML file or the file object itself.
read_schema (bool, optional) – If
True
, attempt to extract information from the XML schema mentioned in the file header (default). Otherwise, use default parameters. Disable this to avoid waiting on slow network connections or if you don’t like to get the related warnings.iterative (bool, optional) – Defines whether iterative parsing should be used. It helps reduce memory usage at almost the same parsing speed. Default is
True
.
- Returns:
out – An iterator over the dicts with feature properties.
- Return type:
iterator