Pyteomics documentation v4.7.1

mztab - mzTab file reader

«  mzid - mzIdentML file reader   ::   Contents   ::   usi - Universal Spectrum Identifier (USI) parser and minimal PROXI client  »

mztab - mzTab file reader

Summary

mzTab is one of the standards developed by the Proteomics Informatics working group of the HUPO Proteomics Standard Initiative.

This module provides a way to read mzTab files into a collection of pandas.DataFrame instances in memory, along with a mapping of the file-level metadata. MzTab specifications 1.0 and 2.0 are supported.

Data access

MzTab - a class representing a single mzTab file.

Helpers

Group - a collection of metadata relating to one entity.

Internals

_MzTabTable - a single table in an mzTab file.

Property Management

mztab uses metaprogramming to generate its metadata accessors, generated by these classes working in concert.


class pyteomics.mztab._MzTabTable(name, header=None, rows=None)[source]

Bases: _MzTabParserBase

An internal class for accumulating information about an single table represented in an mzTab file

header

The column names for the table

Type:

list

name

The table’s name, human readable

Type:

str

rows

An accumulator of table rows

Type:

list

__init__(name, header=None, rows=None)[source]
as_df(index=None)[source]

Convert the table to a DataFrame in memory.

Return type:

pd.DataFrame

collapse_properties(proplist)

Collapse a flat property list into a hierchical structure.

This is intended to operate on Mapping objects, including dict, pandas.Series and pandas.DataFrame.

{
  "ms_run[1]-format": "Andromeda:apl file format",
  "ms_run[1]-location": "file://...",
  "ms_run[1]-id_format": "scan number only nativeID format"
}

to

{
  "ms_run": [
    {
      "format": "Andromeda:apl file format",
      "location": "file://...",
      "id_format": "scan number only nativeID format"
    }
  ]
}
Parameters:

proplist (Mapping) – Key-Value pairs to collapse

Returns:

The collapsed property list

Return type:

OrderedDict

gather(mapping)

Collapse property lists using collapse_properties() and then gather collections of entites into lists.

Parameters:

mapping (dict) – The flattened hierarchy of properties to re-construct

Returns:

A Group of all entities and collections of entities

Return type:

Group

class pyteomics.mztab.MetadataBackedCollection(name, variant_required=None)[source]

Bases: object

__init__(name, variant_required=None)[source]
class pyteomics.mztab.Group[source]

Bases: OrderedDict

A type for holding collections of arbitrarily nested keys from rows and metadata mappings.

Implemented as an autovivifying OrderedDict variant. As such implements the Mapping interface.

__init__(*args, **kwargs)
clear() None.  Remove all items from od.
copy() a shallow copy of od
fromkeys(value=None)

Create a new ordered dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

get_path(path, default=None)[source]

As get() but over a path key parsed with extract_path().

Parameters:
  • path (str) – The path to search down

  • default (object, optional) – The return value when the path is missing

Return type:

object

items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
move_to_end(key, last=True)

Move an existing element to the end (or beginning if last is false).

Raise KeyError if the element does not exist.

pop(key[, default]) v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem(last=True)

Remove and return a (key, value) pair from the dictionary.

Pairs are returned in LIFO order if last is true or FIFO order if false.

setdefault(key, default=None)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values
class pyteomics.mztab.MetadataBackedProperty(name, variant_required=None)[source]

Bases: object

Our descriptor type which uses the instance’s metadata attribute to carry its values

__init__(name, variant_required=None)[source]
class pyteomics.mztab.MetadataPropertyAnnotator(name, bases, attrs)[source]

Bases: type

A simple metaclass to do some class-creation time introspection and descriptor binding.

Uses a list of strings or 3-tuples from __metadata_properties__ to bind MetadataBackedProperty or MetadataBackedCollection onto the class during its creation.

The specification for a property is a tuple of three values:
  1. The metadata key to fetch

  2. The property name to expose on the object

  3. The variant(s) which require this metadata key be present

("mzTab-version", "version", ("M", "P")) would be interpreted as Expose a property “version” on instances which serves the key “mzTab-version” from the instance’s metadata, and raise an error if it is absent in the “M” or “P” variants.

Alternatively a specification may be a single string which will be interpreted as the metadata key, and used to generate the property name replacing all ‘-’ with ‘_’ and assumed to be optional in all variants.

If a metadata key ends with “[]” the property is assumed to be a collection. mzTab makes heavy use of “<collection_name>[<index>]…” keys to define groups of homogenous object types, often with per-element attributes.

variable_mod[1]    CHEMMOD:15.9949146221
variable_mod[1]-site  M
variable_mod[1]-position    Anywhere
variable_mod[2]    CHEMMOD:42.0105646863
variable_mod[2]-site  N-term
variable_mod[2]-position Protein N-term

A specification ("variable_mod[]", "variable_mods", ()) would create a property that returns:

>>>instance.variable_mods
Group([(1,
            {'name': 'CHEMMOD:15.9949146221',
             'position': 'Anywhere',
             'site': 'M'}),
        (2,
            {'name': 'CHEMMOD:42.0105646863',
             'position': 'Protein N-term',
             'site': 'N-term'})])

For precise description of the property collection algorithm, see collapse_properties() and gather().

If any base classes have a __metadata_properties__ attribute, it will also be included unless __inherit_metadata_properties__ is set to False. Any names explicitly set by the current class override this automatic property generation.

__init__(*args, **kwargs)
mro()

Return a type’s method resolution order.

class pyteomics.mztab.MzTab(path, encoding='utf8', table_format='df')[source]

Bases: _MzTabParserBase

Parser for mzTab format files.

comments

A list of comments across the file

Type:

list

file

A file stream wrapper for the file to be read

Type:

_file_obj

metadata

A mapping of metadata that was entities.

Type:

OrderedDict

peptide_table

The table of peptides. Not commonly used.

Type:

_MzTabTable or pd.DataFrame

protein_table

The table of protein identifications.

Type:

_MzTabTable or pd.DataFrame

small_molecule_table

The table of small molecule identifications.

Type:

_MzTabTable or pd.DataFrame

spectrum_match_table

The table of spectrum-to-peptide match identifications.

Type:

_MzTabTable or pd.DataFrame

table_format

The structure type to replace each table with. The string ‘df’ will use pd.DataFrame instances. ‘dict’ will create a dictionary of dictionaries for each table. A callable will be called on each raw _MzTabTable object

Type:

‘df’, ‘dict’, or callable

Additional components of :attr:`metadata` are exposed as properties, returning
single values or aggregated collections of objects.
__init__(path, encoding='utf8', table_format='df')[source]
assays

Accesses the ‘assay’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

collapse_properties(proplist)

Collapse a flat property list into a hierchical structure.

This is intended to operate on Mapping objects, including dict, pandas.Series and pandas.DataFrame.

{
  "ms_run[1]-format": "Andromeda:apl file format",
  "ms_run[1]-location": "file://...",
  "ms_run[1]-id_format": "scan number only nativeID format"
}

to

{
  "ms_run": [
    {
      "format": "Andromeda:apl file format",
      "location": "file://...",
      "id_format": "scan number only nativeID format"
    }
  ]
}
Parameters:

proplist (Mapping) – Key-Value pairs to collapse

Returns:

The collapsed property list

Return type:

OrderedDict

colunit_peptide

Accesses the ‘colunit_peptide’ key in the metadata mapping attached to this object.

Return type:

object

colunit_protein

Accesses the ‘colunit_protein’ key in the metadata mapping attached to this object.

Return type:

object

colunit_psm

Accesses the ‘colunit_psm’ key in the metadata mapping attached to this object.

Return type:

object

colunit_small_molecule

Accesses the ‘colunit_small_molecule’ key in the metadata mapping attached to this object.

Return type:

object

colunit_small_molecule_evidence

Accesses the ‘colunit-small_molecule_evidence’ key in the metadata mapping attached to this object.

Return type:

object

colunit_small_molecule_feature

Accesses the ‘colunit-small_molecule_feature’ key in the metadata mapping attached to this object.

Return type:

object

contacts

Accesses the ‘contact’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

custom

Accesses the ‘custom’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

cvs

Accesses the ‘cv’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -M variant.

Return type:

Group

databases

Accesses the ‘database’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -M variant.

Return type:

Group

derivatization_agents

Accesses the ‘derivatization_agent’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

description

Accesses the ‘description’ key in the metadata mapping attached to this object.

Return type:

object

external_study_uris

Accesses the ‘external_study_uri’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

false_discovery_rate

Accesses the ‘false_discovery_rate’ key in the metadata mapping attached to this object.

Return type:

object

fixed_mods

Accesses the ‘fixed_mod’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -P variant.

Return type:

Group

gather(mapping)

Collapse property lists using collapse_properties() and then gather collections of entites into lists.

Parameters:

mapping (dict) – The flattened hierarchy of properties to re-construct

Returns:

A Group of all entities and collections of entities

Return type:

Group

id

Accesses the ‘mzTab-ID’ key in the metadata mapping attached to this object.

This key must be present when the file is of -M variant.

Return type:

object

id_confidence_measures

Accesses the ‘id_confidence_measure’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -M variant.

Return type:

Group

instruments

Accesses the ‘instrument’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

mode

Accesses the ‘mzTab-mode’ key in the metadata mapping attached to this object.

This key must be present when the file is of -P variant.

Return type:

object

ms_runs

Accesses the ‘ms_run’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -M or -P variants.

Return type:

Group

protein_search_engine_scores

Accesses the ‘protein_search_engine_score’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

psm_search_engine_scores

Accesses the ‘psm_search_engine_score’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

publications

Accesses the ‘publication’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

quantification_method

Accesses the ‘quantification_method’ key in the metadata mapping attached to this object.

This key must be present when the file is of -M variant.

Return type:

object

sample_processing

Accesses the ‘sample_processing’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

samples

Accesses the ‘sample’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

small_molecule_feature_quantification_unit

Accesses the ‘small_molecule_feature-quantification_unit’ key in the metadata mapping attached to this object.

This key must be present when the file is of -M variant.

Return type:

object

small_molecule_identification_reliability

Accesses the ‘small_molecule-identification_reliability’ key in the metadata mapping attached to this object.

Return type:

object

small_molecule_quantification_unit

Accesses the ‘small_molecule-quantification_unit’ key in the metadata mapping attached to this object.

This key must be present when the file is of -M variant.

Return type:

object

software

Accesses the ‘software’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

study_variables

Accesses the ‘study_variable’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -M variant.

Return type:

Group

title

Accesses the ‘title’ key in the metadata mapping attached to this object.

Return type:

object

type

Accesses the ‘mzTab-type’ key in the metadata mapping attached to this object.

This key must be present when the file is of -P variant.

Return type:

object

uris

Accesses the ‘uri’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

Return type:

Group

variable_mods

Accesses the ‘variable_mod’ key group gathered in the metadata mapping attached to this object.

This group is dynamically generated on each access and may be expensive for repeated use.

This key must be present when the file is of -P variant.

Return type:

Group

version

Accesses the ‘mzTab-version’ key in the metadata mapping attached to this object.

Return type:

object

pyteomics.mztab.extract_path(path)[source]

Parse key[index]_next_key[next_index]… sequences into lists of (key, index) pairs.

Parameters:

path (str) – The path key to parse

Return type:

list

«  mzid - mzIdentML file reader   ::   Contents   ::   usi - Universal Spectrum Identifier (USI) parser and minimal PROXI client  »