Pyteomics documentation v4.6.1a1

mztab - mzTab file reader

«  mzid - mzIdentML file reader   ::   Contents   ::   usi - Universal Spectrum Identifier (USI) parser and minimal PROXI client  »

mztab - mzTab file reader

Summary

mzTab is one of the standards developed by the Proteomics Informatics working group of the HUPO Proteomics Standard Initiative.

This module provides a way to read mzTab files into a collection of pandas.DataFrame instances in memory, along with a mapping of the file-level metadata. MzTab specifications 1.0 and 2.0 are supported.

Data access

MzTab - a class representing a single mzTab file.

Helpers

Group - a collection of metadata relating to one entity.

Internals

_MzTabTable - a single table in an mzTab file.

Property Management

mztab uses metaprogramming to generate its metadata accessors, generated by these classes working in concert.


class pyteomics.mztab._MzTabTable(name, header=None, rows=None)[source]

Bases: pyteomics.mztab._MzTabParserBase

An internal class for accumulating information about an single table represented in an mzTab file

header

The column names for the table

Type:list
name

The table’s name, human readable

Type:str
rows

An accumulator of table rows

Type:list
__init__(name, header=None, rows=None)[source]

Initialize self. See help(type(self)) for accurate signature.

as_df(index=None)[source]

Convert the table to a DataFrame in memory.

Returns:
Return type:pd.DataFrame
collapse_properties(proplist)

Collapse a flat property list into a hierchical structure.

This is intended to operate on Mapping objects, including dict, pandas.Series and pandas.DataFrame.

{
  "ms_run[1]-format": "Andromeda:apl file format",
  "ms_run[1]-location": "file://...",
  "ms_run[1]-id_format": "scan number only nativeID format"
}

to

{
  "ms_run": [
    {
      "format": "Andromeda:apl file format",
      "location": "file://...",
      "id_format": "scan number only nativeID format"
    }
  ]
}
Parameters:proplist (Mapping) – Key-Value pairs to collapse
Returns:The collapsed property list
Return type:OrderedDict
gather(mapping)

Collapse property lists using collapse_properties() and then gather collections of entites into lists.

Parameters:mapping (dict) – The flattened hierarchy of properties to re-construct
Returns:A Group of all entities and collections of entities
Return type:Group
class pyteomics.mztab.MetadataBackedCollection(name, variant_required=None)[source]

Bases: object

__init__(name, variant_required=None)[source]

Initialize self. See help(type(self)) for accurate signature.

class pyteomics.mztab.Group[source]

Bases: collections.OrderedDict

A type for holding collections of arbitrarily nested keys from rows and metadata mappings.

Implemented as an autovivifying OrderedDict variant. As such implements the Mapping interface.

__init__

Initialize self. See help(type(self)) for accurate signature.

clear() → None. Remove all items from od.
copy() → a shallow copy of od
fromkeys()

Create a new ordered dictionary with keys from iterable and values set to value.

get()

Return the value for key if key is in the dictionary, else default.

get_path(path, default=None)[source]

As get() but over a path key parsed with extract_path().

Parameters:
  • path (str) – The path to search down
  • default (object, optional) – The return value when the path is missing
Returns:

Return type:

object

items() → a set-like object providing a view on D's items
keys() → a set-like object providing a view on D's keys
move_to_end()

Move an existing element to the end (or beginning if last is false).

Raise KeyError if the element does not exist.

pop(k[, d]) → v, remove specified key and return the corresponding

value. If key is not found, d is returned if given, otherwise KeyError is raised.

popitem()

Remove and return a (key, value) pair from the dictionary.

Pairs are returned in LIFO order if last is true or FIFO order if false.

setdefault()

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values
class pyteomics.mztab.MetadataBackedProperty(name, variant_required=None)[source]

Bases: object

Our descriptor type which uses the instance’s metadata attribute to carry its values

__init__(name, variant_required=None)[source]

Initialize self. See help(type(self)) for accurate signature.

class pyteomics.mztab.MetadataPropertyAnnotator[source]

Bases: type

A simple metaclass to do some class-creation time introspection and descriptor binding.

Uses a list of strings or 3-tuples from __metadata_properties__ to bind MetadataBackedProperty or MetadataBackedCollection onto the class during its creation.

The specification for a property is a tuple of three values:
  1. The metadata key to fetch
  2. The property name to expose on the object
  3. The variant(s) which require this metadata key be present

("mzTab-version", "version", ("M", "P")) would be interpreted as Expose a property “version” on instances which serves the key “mzTab-version” from the instance’s metadata, and raise an error if it is absent in the “M” or “P” variants.

Alternatively a specification may be a single string which will be interpreted as the metadata key, and used to generate the property name replacing all ‘-’ with ‘_’ and assumed to be optional in all variants.

If a metadata key ends with “[]” the property is assumed to be a collection. mzTab makes heavy use of “<collection_name>[<index>]…” keys to define groups of homogenous object types, often with per-element attributes.

A specification ("variable_mod[]", "variable_mods", ()) would create a property that returns:

>>>instance.variable_mods
Group([(1,
            {'name': 'CHEMMOD:15.9949146221',
             'position': 'Anywhere',
             'site': 'M'}),
        (2,
            {'name': 'CHEMMOD:42.0105646863',
             'position': 'Protein N-term',
             'site': 'N-term'})])

For precise description of the property collection algorithm, see collapse_properties() and gather().

If any base classes have a __metadata_properties__ attribute, it will also be included unless __inherit_metadata_properties__ is set to False. Any names explicitly set by the current class override this automatic property generation.

__init__

Initialize self. See help(type(self)) for accurate signature.

mro()

Return a type’s method resolution order.

class pyteomics.mztab.MzTab(path, encoding='utf8', table_format='df')[source]

Bases: pyteomics.mztab._MzTabParserBase

Parser for mzTab format files.

comments

A list of comments across the file

Type:list
file

A file stream wrapper for the file to be read

Type:_file_obj
metadata

A mapping of metadata that was entities.

Type:OrderedDict
peptide_table

The table of peptides. Not commonly used.

Type:_MzTabTable or pd.DataFrame
protein_table

The table of protein identifications.

Type:_MzTabTable or pd.DataFrame
small_molecule_table

The table of small molecule identifications.

Type:_MzTabTable or pd.DataFrame
spectrum_match_table

The table of spectrum-to-peptide match identifications.

Type:_MzTabTable or pd.DataFrame
table_format

The structure type to replace each table with. The string ‘df’ will use pd.DataFrame instances. ‘dict’ will create a dictionary of dictionaries for each table. A callable will be called on each raw _MzTabTable object

Type:‘df’, ‘dict’, or callable
Additional components of :attr:`metadata` are exposed as properties, returning
single values or aggregated collections of objects.
__init__(path, encoding='utf8', table_format='df')[source]

Initialize self. See help(type(self)) for accurate signature.

assays

Accesses the ‘assay’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
collapse_properties(proplist)

Collapse a flat property list into a hierchical structure.

This is intended to operate on Mapping objects, including dict, pandas.Series and pandas.DataFrame.

{
  "ms_run[1]-format": "Andromeda:apl file format",
  "ms_run[1]-location": "file://...",
  "ms_run[1]-id_format": "scan number only nativeID format"
}

to

{
  "ms_run": [
    {
      "format": "Andromeda:apl file format",
      "location": "file://...",
      "id_format": "scan number only nativeID format"
    }
  ]
}
Parameters:proplist (Mapping) – Key-Value pairs to collapse
Returns:The collapsed property list
Return type:OrderedDict
colunit_peptide

Accesses the ‘colunit_peptide’ key in the metadata mapping attached to this object.

Returns:
Return type:object
colunit_protein

Accesses the ‘colunit_protein’ key in the metadata mapping attached to this object.

Returns:
Return type:object
colunit_psm

Accesses the ‘colunit_psm’ key in the metadata mapping attached to this object.

Returns:
Return type:object
colunit_small_molecule

Accesses the ‘colunit_small_molecule’ key in the metadata mapping attached to this object.

Returns:
Return type:object
colunit_small_molecule_evidence

Accesses the ‘colunit-small_molecule_evidence’ key in the metadata mapping attached to this object.

Returns:
Return type:object
colunit_small_molecule_feature

Accesses the ‘colunit-small_molecule_feature’ key in the metadata mapping attached to this object.

Returns:
Return type:object
contacts

Accesses the ‘contact’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
custom

Accesses the ‘custom’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
cvs

Accesses the ‘cv’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -M variant.

Returns:
Return type:Group
databases

Accesses the ‘database’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -M variant.

Returns:
Return type:Group
derivatization_agents

Accesses the ‘derivatization_agent’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
description

Accesses the ‘description’ key in the metadata mapping attached to this object.

Returns:
Return type:object
external_study_uris

Accesses the ‘external_study_uri’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
false_discovery_rate

Accesses the ‘false_discovery_rate’ key in the metadata mapping attached to this object.

Returns:
Return type:object
fixed_mods

Accesses the ‘fixed_mod’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -P variant.

Returns:
Return type:Group
gather(mapping)

Collapse property lists using collapse_properties() and then gather collections of entites into lists.

Parameters:mapping (dict) – The flattened hierarchy of properties to re-construct
Returns:A Group of all entities and collections of entities
Return type:Group
id

Accesses the ‘mzTab-ID’ key in the metadata mapping attached to this object.

This key must be present when the file is of -M variant.

Returns:
Return type:object
id_confidence_measures

Accesses the ‘id_confidence_measure’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -M variant.

Returns:
Return type:Group
instruments

Accesses the ‘instrument’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
mode

Accesses the ‘mzTab-mode’ key in the metadata mapping attached to this object.

This key must be present when the file is of -P variant.

Returns:
Return type:object
ms_runs

Accesses the ‘ms_run’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -M or -P variants.

Returns:
Return type:Group
protein_search_engine_scores

Accesses the ‘protein_search_engine_score’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
psm_search_engine_scores

Accesses the ‘psm_search_engine_score’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
publications

Accesses the ‘publication’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
quantification_method

Accesses the ‘quantification_method’ key in the metadata mapping attached to this object.

This key must be present when the file is of -M variant.

Returns:
Return type:object
sample_processing

Accesses the ‘sample_processing’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
samples

Accesses the ‘sample’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
small_molecule_feature_quantification_unit

Accesses the ‘small_molecule_feature-quantification_unit’ key in the metadata mapping attached to this object.

This key must be present when the file is of -M variant.

Returns:
Return type:object
small_molecule_identification_reliability

Accesses the ‘small_molecule-identification_reliability’ key in the metadata mapping attached to this object.

Returns:
Return type:object
small_molecule_quantification_unit

Accesses the ‘small_molecule-quantification_unit’ key in the metadata mapping attached to this object.

This key must be present when the file is of -M variant.

Returns:
Return type:object
software

Accesses the ‘software’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
study_variables

Accesses the ‘study_variable’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -M variant.

Returns:
Return type:Group
title

Accesses the ‘title’ key in the metadata mapping attached to this object.

Returns:
Return type:object
type

Accesses the ‘mzTab-type’ key in the metadata mapping attached to this object.

This key must be present when the file is of -P variant.

Returns:
Return type:object
uris

Accesses the ‘uri’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Returns:
Return type:Group
variable_mods

Accesses the ‘variable_mod’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -P variant.

Returns:
Return type:Group
version

Accesses the ‘mzTab-version’ key in the metadata mapping attached to this object.

Returns:
Return type:object
pyteomics.mztab.extract_path(path)[source]

Parse key[index]_next_key[next_index]… sequences into lists of (key, index) pairs.

Parameters:path (str) – The path key to parse
Returns:
Return type:list

«  mzid - mzIdentML file reader   ::   Contents   ::   usi - Universal Spectrum Identifier (USI) parser and minimal PROXI client  »