mztab - mzTab file reader¶
Summary¶
mzTab is one of the standards developed by the Proteomics Informatics working group of the HUPO Proteomics Standard Initiative.
This module provides a way to read mzTab files into a collection of
pandas.DataFrame
instances in memory, along with a mapping
of the file-level metadata. MzTab specifications 1.0 and 2.0 are supported.
Internals¶
_MzTabTable
- a single table in an mzTab file.
Property Management¶
mztab
uses metaprogramming to generate its metadata accessors, generated by
these classes working in concert.
-
class
pyteomics.mztab.
_MzTabTable
(name, header=None, rows=None)[source]¶ Bases:
pyteomics.mztab._MzTabParserBase
An internal class for accumulating information about an single table represented in an mzTab file
-
__init__
(name, header=None, rows=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
as_df
(index=None)[source]¶ Convert the table to a DataFrame in memory.
Returns: Return type: pd.DataFrame
-
collapse_properties
(proplist)¶ Collapse a flat property list into a hierchical structure.
This is intended to operate on
Mapping
objects, includingdict
,pandas.Series
andpandas.DataFrame
.{ "ms_run[1]-format": "Andromeda:apl file format", "ms_run[1]-location": "file://...", "ms_run[1]-id_format": "scan number only nativeID format" }
to
{ "ms_run": [ { "format": "Andromeda:apl file format", "location": "file://...", "id_format": "scan number only nativeID format" } ] }
Parameters: proplist ( Mapping
) – Key-Value pairs to collapseReturns: The collapsed property list Return type: OrderedDict
-
gather
(mapping)¶ Collapse property lists using
collapse_properties()
and then gather collections of entites into lists.Parameters: mapping (dict) – The flattened hierarchy of properties to re-construct Returns: A Group
of all entities and collections of entitiesReturn type: Group
-
-
class
pyteomics.mztab.
Group
[source]¶ Bases:
collections.OrderedDict
A type for holding collections of arbitrarily nested keys from rows and metadata mappings.
Implemented as an autovivifying
OrderedDict
variant. As such implements theMapping
interface.-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
clear
() → None. Remove all items from od.¶
-
copy
() → a shallow copy of od¶
-
fromkeys
()¶ Create a new ordered dictionary with keys from iterable and values set to value.
-
get
()¶ Return the value for key if key is in the dictionary, else default.
-
get_path
(path, default=None)[source]¶ As
get()
but over a path key parsed withextract_path()
.Parameters: Returns: Return type:
-
items
() → a set-like object providing a view on D's items¶
-
keys
() → a set-like object providing a view on D's keys¶
-
move_to_end
()¶ Move an existing element to the end (or beginning if last is false).
Raise KeyError if the element does not exist.
-
pop
(k[, d]) → v, remove specified key and return the corresponding¶ value. If key is not found, d is returned if given, otherwise KeyError is raised.
-
popitem
()¶ Remove and return a (key, value) pair from the dictionary.
Pairs are returned in LIFO order if last is true or FIFO order if false.
-
setdefault
()¶ Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
-
update
([E, ]**F) → None. Update D from dict/iterable E and F.¶ If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
-
values
() → an object providing a view on D's values¶
-
-
class
pyteomics.mztab.
MetadataBackedProperty
(name, variant_required=None)[source]¶ Bases:
object
Our descriptor type which uses the instance’s metadata attribute to carry its values
-
class
pyteomics.mztab.
MetadataPropertyAnnotator
[source]¶ Bases:
type
A simple metaclass to do some class-creation time introspection and descriptor binding.
Uses a list of strings or 3-tuples from
__metadata_properties__
to bindMetadataBackedProperty
orMetadataBackedCollection
onto the class during its creation.- The specification for a property is a tuple of three values:
- The metadata key to fetch
- The property name to expose on the object
- The variant(s) which require this metadata key be present
("mzTab-version", "version", ("M", "P"))
would be interpreted as Expose a property “version” on instances which serves the key “mzTab-version” from the instance’smetadata
, and raise an error if it is absent in the “M” or “P” variants.Alternatively a specification may be a single string which will be interpreted as the metadata key, and used to generate the property name replacing all ‘-’ with ‘_’ and assumed to be optional in all variants.
If a metadata key ends with “[]” the property is assumed to be a collection. mzTab makes heavy use of “<collection_name>[<index>]…” keys to define groups of homogenous object types, often with per-element attributes.
A specification
("variable_mod[]", "variable_mods", ())
would create a property that returns:>>>instance.variable_mods Group([(1, {'name': 'CHEMMOD:15.9949146221', 'position': 'Anywhere', 'site': 'M'}), (2, {'name': 'CHEMMOD:42.0105646863', 'position': 'Protein N-term', 'site': 'N-term'})])
For precise description of the property collection algorithm, see
collapse_properties()
andgather()
.If any base classes have a
__metadata_properties__
attribute, it will also be included unless__inherit_metadata_properties__
is set toFalse
. Any names explicitly set by the current class override this automatic property generation.-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
mro
()¶ Return a type’s method resolution order.
-
class
pyteomics.mztab.
MzTab
(path, encoding='utf8', table_format='df')[source]¶ Bases:
pyteomics.mztab._MzTabParserBase
Parser for mzTab format files.
-
file
¶ A file stream wrapper for the file to be read
Type: _file_obj
-
metadata
¶ A mapping of metadata that was entities.
Type: OrderedDict
-
peptide_table
¶ The table of peptides. Not commonly used.
Type: _MzTabTable or pd.DataFrame
-
protein_table
¶ The table of protein identifications.
Type: _MzTabTable or pd.DataFrame
-
small_molecule_table
¶ The table of small molecule identifications.
Type: _MzTabTable or pd.DataFrame
-
spectrum_match_table
¶ The table of spectrum-to-peptide match identifications.
Type: _MzTabTable or pd.DataFrame
-
table_format
¶ The structure type to replace each table with. The string ‘df’ will use pd.DataFrame instances. ‘dict’ will create a dictionary of dictionaries for each table. A callable will be called on each raw _MzTabTable object
Type: ‘df’, ‘dict’, or callable
-
Additional components of :attr:`metadata` are exposed as properties, returning
-
single values or aggregated collections of objects.
-
__init__
(path, encoding='utf8', table_format='df')[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
assays
¶ Accesses the ‘assay’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
collapse_properties
(proplist)¶ Collapse a flat property list into a hierchical structure.
This is intended to operate on
Mapping
objects, includingdict
,pandas.Series
andpandas.DataFrame
.{ "ms_run[1]-format": "Andromeda:apl file format", "ms_run[1]-location": "file://...", "ms_run[1]-id_format": "scan number only nativeID format" }
to
{ "ms_run": [ { "format": "Andromeda:apl file format", "location": "file://...", "id_format": "scan number only nativeID format" } ] }
Parameters: proplist ( Mapping
) – Key-Value pairs to collapseReturns: The collapsed property list Return type: OrderedDict
-
colunit_peptide
¶ Accesses the ‘colunit_peptide’ key in the
metadata
mapping attached to this object.Returns: Return type: object
-
colunit_protein
¶ Accesses the ‘colunit_protein’ key in the
metadata
mapping attached to this object.Returns: Return type: object
-
colunit_psm
¶ Accesses the ‘colunit_psm’ key in the
metadata
mapping attached to this object.Returns: Return type: object
-
colunit_small_molecule
¶ Accesses the ‘colunit_small_molecule’ key in the
metadata
mapping attached to this object.Returns: Return type: object
-
colunit_small_molecule_evidence
¶ Accesses the ‘colunit-small_molecule_evidence’ key in the
metadata
mapping attached to this object.Returns: Return type: object
-
colunit_small_molecule_feature
¶ Accesses the ‘colunit-small_molecule_feature’ key in the
metadata
mapping attached to this object.Returns: Return type: object
-
contacts
¶ Accesses the ‘contact’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
custom
¶ Accesses the ‘custom’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
cvs
¶ Accesses the ‘cv’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
This key must be present when the file is of -M variant.
Returns: Return type: Group
-
databases
¶ Accesses the ‘database’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
This key must be present when the file is of -M variant.
Returns: Return type: Group
-
derivatization_agents
¶ Accesses the ‘derivatization_agent’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
description
¶ Accesses the ‘description’ key in the
metadata
mapping attached to this object.Returns: Return type: object
-
external_study_uris
¶ Accesses the ‘external_study_uri’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
false_discovery_rate
¶ Accesses the ‘false_discovery_rate’ key in the
metadata
mapping attached to this object.Returns: Return type: object
-
fixed_mods
¶ Accesses the ‘fixed_mod’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
This key must be present when the file is of -P variant.
Returns: Return type: Group
-
gather
(mapping)¶ Collapse property lists using
collapse_properties()
and then gather collections of entites into lists.Parameters: mapping (dict) – The flattened hierarchy of properties to re-construct Returns: A Group
of all entities and collections of entitiesReturn type: Group
-
id
¶ Accesses the ‘mzTab-ID’ key in the
metadata
mapping attached to this object.This key must be present when the file is of -M variant.
Returns: Return type: object
-
id_confidence_measures
¶ Accesses the ‘id_confidence_measure’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
This key must be present when the file is of -M variant.
Returns: Return type: Group
-
instruments
¶ Accesses the ‘instrument’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
mode
¶ Accesses the ‘mzTab-mode’ key in the
metadata
mapping attached to this object.This key must be present when the file is of -P variant.
Returns: Return type: object
-
ms_runs
¶ Accesses the ‘ms_run’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
This key must be present when the file is of -M or -P variants.
Returns: Return type: Group
-
protein_search_engine_scores
¶ Accesses the ‘protein_search_engine_score’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
psm_search_engine_scores
¶ Accesses the ‘psm_search_engine_score’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
publications
¶ Accesses the ‘publication’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
quantification_method
¶ Accesses the ‘quantification_method’ key in the
metadata
mapping attached to this object.This key must be present when the file is of -M variant.
Returns: Return type: object
-
sample_processing
¶ Accesses the ‘sample_processing’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
samples
¶ Accesses the ‘sample’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
small_molecule_feature_quantification_unit
¶ Accesses the ‘small_molecule_feature-quantification_unit’ key in the
metadata
mapping attached to this object.This key must be present when the file is of -M variant.
Returns: Return type: object
-
small_molecule_identification_reliability
¶ Accesses the ‘small_molecule-identification_reliability’ key in the
metadata
mapping attached to this object.Returns: Return type: object
-
small_molecule_quantification_unit
¶ Accesses the ‘small_molecule-quantification_unit’ key in the
metadata
mapping attached to this object.This key must be present when the file is of -M variant.
Returns: Return type: object
-
software
¶ Accesses the ‘software’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-
study_variables
¶ Accesses the ‘study_variable’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
This key must be present when the file is of -M variant.
Returns: Return type: Group
-
title
¶ Accesses the ‘title’ key in the
metadata
mapping attached to this object.Returns: Return type: object
-
type
¶ Accesses the ‘mzTab-type’ key in the
metadata
mapping attached to this object.This key must be present when the file is of -P variant.
Returns: Return type: object
-
uris
¶ Accesses the ‘uri’ key group gathered in the
metadata
mapping attached to this object.This group is dynamically generated on each access and may be expensive for repeated use.
Returns: Return type: Group
-