History of changes¶

5.0¶

Update the standard ion compositions to be more consistent with the adopted ion type notation. The “z+1” and “c+1” ions now represent compositions described as “c-ion plus a hydrogen” and “z-ion plus a hydrogen”, respectively. The “z-dot” and “c-dot” notations are retained unchanged for partial backward compatibility, they are now equivalent to “z+1” and “c+1”.

Warning

Make sure to check what ion types you are using in mass and composition calculations! In short, z+1, z+2, z+3 have all been reduced by one hydrogen.

Support ProForma 2.1 (#183 by Joshua Klein). You can calculate compositions for ProForma objects using pyteomics.proforma.Proforma.composition() and get m/z with annotated or user-provided charge state using pyteomics.proforma.Proforma.mz(). ChargeState was added to include adduct information, while still being reducible to int wherever possible. ChargeState also includes localized charged modifications (#203). ProForma functionality is likely to evolve in the future.

You can also iterate through possible peptidoforms when a ProForma sequence is annotated with some ambiguity using pyteomics.proforma.Proforma.proteoforms() and apply additional modification specifications to any ProForma sequence using pyteomics.proforma.proteoforms() (#196 by Joshua Klein).

Implement thread-based parallelism. Following the introduction of official free-threading Python implementations users are now able to use theads through the map() interface of indexing parsers. The default behavior of map() is still to use multiprocessing, but the user can pass method='t' to use worker threads. Also, methods pmap() and tmap() can be used to directly invoke process-based or thread-based mapping, respectively.

Implement type checking using psims when parsing cvParam elements in mzML, mzIdentML and TraML files (#165). psims is now required to parse these formats. Every parser instance now uses a copy of the PSI-MS controlled vocabulary. A pre-created object can be passed during parser creation, or it will be constructed on the fly.

Note

Consider enabling and configuring OBO cache to save time when creating mzML, mzIdentML and TraML parser objects.

Change the default subprocess start method used by the map() method of indexing parsers on Linux-like platforms and add a new function, auxiliary.set_start_method(), to configure it (#172).

Read more about the new behavior and the rationale in the docs.

Drop Python 2 support. See #167 for the announcement.

Pyteomics now uses the implicit namespace mechanism.

Fix compatibility with lxml 5.4.0 and newer (#170).

Ensure URL prefix when creating a mass.Unimod object (#169).

Improve spectrum_utils integration.

pyteomics.pylab_aux.mirror() can now annotate spectra with two different sequences.

Retain StPeter output in pyteomics.protxml.DataFrame().

Fix #163 (#164 by Dikshant jha).

In pyteomics.achrom, return regular Python types instead of NumPy scalars.

Fix SQLAlchemy warnings in unimod (#171), require sqlalchemy >= 1.4.

New function pyteomics.mass.fragment_series() to calculate m/z for series of fragment ions.

The default backend in pyteomics.pylab_aux.annotate_spectrum() now supports ProForma sequences.

Add an absolute parameter (True by default) for pyteomics.mass.fast_mass() and pyteomics.mass.fast_mass2().

Allow fragment charges formatted as float in MGF (issue #185).

New functions pyteomics.parser.coverage_mask() and pyteomics.parser.strip().

Add support for explicit XML namespaces (#200).

Fix #47 (Postgres compatibility for pyteomics.mass.unimod) in #201. This changes certain column types from VARCHAR to Unicode text.

4.7.5¶

Fix #159.

4.7.4¶

Fix call signature for pepxml.read() and make it a full alias to pepxml.PepXML.
Disable indexing of pepXML in pepxml.DataFrame(), which fixes creation of dataframes from files with non-unique spectrum IDs.
Allow iteration over search hits instead of spectrum queries with pyteomics.pepxml.PepXML.search_hits(). pyteomics.pepxml.DataFrame() has a new argument by, which accepts values “spectrum_query” (default) and “search_hit” (new).
Fix #156.
Fix #157 (#158 by Joshua Klein).

4.7.3¶

Add compatibility with NumPy 2.0.

Fix #153. MGF parser now recognizes precursor charge specified on the PEPMASS line. If CHARGE is also specified, it is ignored.

4.7.2¶

Fix pickling of resolved ProForma modifications (#144 by Joshua Klein).
Fix a deprecation warning in pyteomics.mass.unimod (#126 by Ralf Gabriels).
Add caching for modifications resolvers in pyteomics.proforma (#148 by Joshua Klein).
Add support for constant terminal modifications in pyteomics.proforma (148 by Joshua Klein).
Fix an exception in pyteomics.ms1 when an information string has only one token (#149).

4.7.1¶

Fix issue with calculate_mass() with a composition keyword argument.

4.7¶

Make proforma.MassModification objects hashable (#130 by Joshua Klein).

Fix #132 (#133 by Joshua Klein).

Fix thermolysin cleavage rule (#135).

Fix #136.

pyteomics.mass.mass.calculate_mass() now supports ProForma. A sequence or a proforma.ProForma object can be passed with the proforma keyword argument (#137).

Fix: restored the ability of IndexedTextReader parsers (pyteomics.mgf.IndexedMGF, pyteomics.fasta.IndexedFASTA, etc.) to load the byte offset index from a previously saved byte offset file (created with cls.prebuild_byte_offset_file() or reader.write_byte_offsets()) (#142).

API change: uncodumented method _build_index() of indexing XML parsers renamed to build_byte_index() (#142).

Add a warning when creating an IndexedTextReader instance with an empty offset index. This warning can be disabled by passing warn_if_empty=False (#138).

4.6.3¶

Fix #122.

Fix #124 (in #125 by Seth Just).

Fix #128 (in #129 by Joshua Klein).

4.6.2¶

pyteomics.fasta.write() can now write entries with parsed sequences (#120 by Vladimir Gorshkov, Joshua Klein and Lev Levitsky).

Fix #119.

Fix import issue with pyteomics.pylab_aux.

4.6.1¶

Make pyteomics.mgf.write() work with a regular list of ints as “charge” param.

Add mean absolute error (MAE) regression in pyteomics.achrom (#117 by Mark Ivanov).

Fix #115 and #118.

Remove auxiliary.Version. pyteomics.version.VersionInfo can be used instead.

For target-decoy calculations, pandas is assumed to be 0.17 or newer.

4.6¶

When passing an existing file (by name) to pyteomics.mgf.write() or pyteomics.fasta.write() and other writing functions, the file will be opened for writing by default. Previously, it would be opened for appending with a warning about the upcoming change. Please be aware that existing files will be overwritten if passed by name. The rationale for this is better reproducibility if the same code is run multiple times. You can use the file_mode argument of the writing functions to override this behaviour, or pass your own file objects.

Add a special warning when trying to write a single spectrum with pyteomics.mgf.write(). See also: Writing one or more MGF spectra to a file.

In pyteomics.mass.mass.calculate_mass(), the absolute parameter is now True by default. When calculating m/z for negative charges, the returned value will be positive by default.

Fix issue #98 (#99 and #101 by Joshua Klein).

Fix issue #91 (#92 by Joshua Klein).

Fix issue #96.

Update the UniProt header pattern (fix rare parsing errors with pyteomics.fasta.UniProt and pyteomics.fasta.IndexedUniProt) in #93.

Update the UniRef header pattern (fix parsing errors with pyteomics.fasta.UniRef and pyteomics.fasta.IndexedUniRef) in #102. Some keys are removed from the output.

Fix pickling issues with pyteomics.mgf.IndexedMGF, pyteomics.ms1.IndexedMS1, pyteomics.ms2.IndexedMS2 (#108).

Add “charge array” and “resolution array” to the output of MS2 parsers (#108). Add new arguments read_charges and read_resolutions to disable parsing, and convert_arrays to govern the creation of NumPy arrays (and masked arrays).

4.5.6¶

New function pyteomics.proforma.set_unimod_path() allowing the ProForma parsing machinery to work with a local Unimod copy (#85 by Joshua Klein). See documentation for a usage example.

New method pyteomics.proforma.Proforma.fragments() to generate m/z for an ion series (#85 by Joshua Klein).

New function pyteomics.parser.to_proforma() helps convert modX sequences to ProForma.

Fix: prevent pyteomics.mass.mass.fast_mass2() from changing aa_mass.

Update pyteomics.pylab_aux.annotate_spectrum() for compatibility with latest spectrum_utils. Pyteomics is now compatible with spectrum_utils 0.4.0 and newer.

4.5.5¶

Fix issue #77.

4.5.4¶

Fix issue #74.

In pyteomics.auxiliary.fdr(), raise PyteomicsError instead of ZeroDivisionError when using formula 1 on input without any target PSMs.

Provide more accurate amino acid masses in mass.std_aa_mass.

Fix SyntaxError in pyteomics.pylab_aux on Python 2.7.

4.5.3¶

Fix ThreadPool shutdown and add new parameter ephemeral_pool in pyteomics.usi.PROXIAggregator (#67 by Joshua Klein).

Bugfix in pyteomics.proforma.GenericModificationResolver (#68 by Joshua Klein).

New helper function pyteomics.fasta.decoy_entries().

New arguments charge_carrier, absolute in mass.calculate_mass() and mass.Composition.mass() (#61). Charge is now only handled in Composition.mass() and not Composition.__init__().

Bugfix in pyteomics.tandem (#71 by @superrino130).

4.5.2¶

Support Python 3.10.

4.5.1¶

Add max_length parameter in pyteomics.parser.cleave().

Bugfix in pyteomics.parser.cleave() for semi=True.

Add regex parameter in pyteomics.parser.cleave() and warn for possible typos in cleavage rule names.

Add functions parser.icleave() (generator) and parser.xcleave() (list) to produce peptide sequences with indices and possible repetitions.

Bugfixes (#63 and #64 by Joshua Klein).

4.5¶

Add support for mzMLb (#35 and #38 by Joshua Klein) with new module pyteomics.mzmlb.

Add ProteomeExchange backend for PROXI requests and implement an aggregator for responses from all backends (#36, #45, and #55 by Joshua Klein) in pyteomics.usi.

Add support for ProForma (#37 by Joshua Klein) in new module pyteomics.proforma.

New arguments keep_nterm_M and fix_aa in pyteomics.fasta.shuffle() (#54 by Vladimir Gorshkov).

Fix for unwanted warnings in pyteomics.auxiliary.file_helpers._check_use_index() when use_index is explicitly passed (#52).

Update the default XML schema for featureXML and fix issues with incorrectly specified data types (#53).

Add a new backend for spectrum annotation and plotting. pyteomics.pylab_aux.plot_spectrum() and pyteomics.pylab_aux.annotate_spectrum() can now use spectrum_utils under the hood (#43).

See new Example 4 for demonstration.

New function pyteomics.pylab_aux.mirror() for making a spectrum_utils mirror plot.

pyteomics.pylab_aux.plot_spectrum() and pyteomics.pylab_aux.annotate_spectrum() now always return matplotlib.pyplot.Axes.

Add a warning when passing an existing file by name in writing functions. The default mode for output files will change from ‘a’ to ‘w’ in a future version.

4.4.2¶

Add cleavage rules from MS ontology as pyteomics.parser.psims_rules. pyteomics.parser.cleave() now understands keys and accessions from psims_rules as rules.

Improve mzIdentML parser performance (and possibly others in some cases) by relying more on offset indexes (#34 by Joshua Klein).

Extend the pyteomics.mztab.MzTab parser with auto-generated properties. Almost all metadata entities are now exposed as properties on the parser object (#23 by Joshua Klein).

Fix the version parsing in pyteomics.mztab to support shorter vMzTab version strings (#24 by Donavan See).

Tweak the pyteomics.pepxml.PepXML parser to present some values that were previously reported as None.

Fix compatibility with SQLAlchemy 1.4 (#32 by Joshua Klein).

4.4.1¶

Further tweaked behavior of pyteomics.auxiliary.file_helpers._check_use_index(), which is responsible for handling of use_index in read() functions in parser modules.

Fix indexing when element identifiers contain XML-escaped characters (#20 by Joshua Klein).

Add support for MzTab 2.0 (#22 by @annalefarova).

Also, check out the Pyteomics Discussions page! You can use it to share your thoughts, ask questions, discuss coding practices, etc.

4.4¶

New module pyteomics.usi implements a minimal Universal Spectrum Identifier parser and PROXI client (#11 by Joshua Klein).

Support peak annotations in MGF (#12 by Julian Müller).

Provide version information in pyteomics.version (#14).

Make the order of isoforms reproducible in pyteomics.parser.isoforms() (#15).

Rename types keyword argument to ion_types in pyteomics.pylab_aux.annotate_spectrum().

Fix #16, a bug introduced in 4.3.3.

4.3.3¶

Add pyteomics.electrochem.gravy() (#9 by Vladimir Gorshkov).

Fixes and improvements in pyteomics.pepxml.roc_curve() (#10 by Andrey Rozenberg).

Changes in guessing behavior of read() functions.

In modules that implement indexing parsers for non-XML formats (MGF, FASTA, PEFF, ms1/ms2), when a parser is instantiated using read(), the parser class to instantiate is guessed based on the mode of the file object passed to read() (text or binary).

With some file-like objects, mode cannot be easily deduced without consuming some of the data. You will now see more warnings in case use_index is not explicitly passed to read() and reading mode is not obvious. There will also be warnings if use_index is specified but the file is opened in the wrong mode. To avoid all of this, you are encouraged to instantiate parser classes directly, or explicitly specify use_index to read() in all corner cases.

4.3.2¶

Fix #7.

4.3.1¶

Technical release.

4.3¶

First release after the move to Github. Issue and PR numbers from now on refer to the Github repo. Archive of the Bibucket issues and PRs is stored here.

Changes in this release:

New module pyteomics.openms.idxml.

Fix #3, #5, and some issues in tandem.

4.2¶

Changes in XML XPath implementation. For standard XML parser classes, this only means a minor change in performance (should be a slight improvement, most noticeable for TandemXML).

For custom classes: the implementation of xpath evaluation in pyteomics.xml.XML.iterfind() has changed. Pseudo-conditions are now not supported. Instead, an attempt is made to support full XPath. The main difference is that the XPath is evaluated on XML elements, whereas pseudo-conditions used to be evaluated for complete Python dictionaries. To reproduce old behavior, you can just write an explicit if statement at an appropriate place. New implementation allows actually skipping the elements that do not satisfy the XPath predicate. When writing classes which by default iterate over elements based on a complex XPath, set _default_iter_path instead of _default_iter_tag.

Warning

Beware that if _default_iter_path differs from _default_iter_tag and you use indexing, all elements corresponding to _default_iter_tag will be indexed. This is a limitation of the index building procedure. This discrepancy will lead to confusing behavior (length checks, membership tests and other things based on index will not correspond to items returned by iteration). map() calls will also operate on the full index.

New keyword arguments queue_size, queue_timeout and processes for indexed parsers with support for map().

New method mass.Unimod.by_id(). Also, mass.Unimod now supports dict-like queries with record IDs.

Reduce memory footprint for unit primitives (PR #35 by Joshua Klein).

New functions pyteomics.auxiliary.sigma_T() and pyteomics.auxiliary.sigma_fdr().

Fix issues #44, #46, #47, #48.

4.1.2¶

Bugfix: fix the standard mass value for pyrrolysine (issue #42).

4.1.1¶

Add numpress support for mzML and mzXML files. To read files compressed with Numpress, install pynumpress (PyPI, GitHub).

Bugfixes.

API changes¶

In ms1.read() and ms2.read(), the default value for use_index is now False. Using the indexed parsers may result in incorrect behavior if the “first” scan number in S-lines is not unique.

4.1¶

New module pyteomics.mztab provides a parser for mzTab files.
New module pyteomics.ms2 provides a parser for ms2 files. This is in fact an alias to ms1, which handles both formats.
Added index saving functionality for pyteomics.mgf.IndexedMGF.
New helper functions pyteomics.pylab_aux.plot_spectrum() and pyteomics.pylab_aux.annotate_spectrum().
The rule and exception arguments in pyteomics.parser.cleave() can be keys from expasy_rules.
Fixes.

4.0.1¶

Fix issue #35 (incorrect order of deserialized offset indexes on older Python versions).

4.0¶

See also

Pyteomics 4.0: five years of development of a Python proteomics framework

Add parameters semi and exception in pyteomics.parser.cleave().

Add new parameter encoding in file writers.

Add new parameters write_charges and use_numpy in pyteomics.mgf.write(). Speed up the writing when numpy is available.

Indexing text parsers. This release introduces a family of parser classes for text files. These parsers create byte offsets of indexed entries to allow random access by unique key or by positional index, “rich” access by slices and, in case of MGF/mzML/mzXML, by retention time range. All indexing parsers, text- or XML-based, now have a unified interface.

New class pyteomics.mgf.IndexedMGF is now the recommended way to parse MGF files. It supports fast access by spectrum titles by using an index of byte offsets. The old, sequential parser is preserved under its name, pyteomics.mgf.MGF. The function pyteomics.mgf.read() now returns an instance of one of the two classes, based on the use_index argument and the type of source. The common ancestor class, pyteomics.mgf.MGFBase, can be used for type checking.

New FASTA parsing classes:

pyteomics.fasta.FASTABase - common ancestor, suitable for type checking;

pyteomics.fasta.FASTA - text-mode, sequential parser; does what the old fasta.read() was doing. Additionally, the following subclasses perform format-specific parsing of FASTA headers:

pyteomics.fasta.UniProt;

pyteomics.fasta.UniParc;

pyteomics.fasta.UniRef;

pyteomics.fasta.UniMes;

pyteomics.fasta.SPD;

pyteomics.fasta.NCBI;

pyteomics.fasta.IndexedFASTA - binary-mode, indexing parser. Supports direct indexing by header string;

pyteomics.fasta.TwoLayerIndexedFASTA - additionally supports indexing by extracted header fields. Format-specific second indexes are available in subclasses:

pyteomics.fasta.IndexedUniProt;

pyteomics.fasta.IndexedUniParc;

pyteomics.fasta.IndexedUniRef;

pyteomics.fasta.IndexedUniMes;

pyteomics.fasta.IndexedSPD;

pyteomics.fasta.IndexedNCBI.

pyteomics.fasta.read() now returns an instance of one of these classes, depending on the arguments use_index and flavor.

pyteomics.ms1.IndexedMS1 and pyteomics.ms1.MS1 are available for ms1 format.

(In collaboration with J. Klein)

Multiprocessing support: all indexed XML and text file parsers now expose a map() method. This method can map a user-supplied function to each file entry in separate processes (or simply parallelize the parsing itself). Additionally, objects returned by chain() functions and iterfind() methods also expose the map() interface to allow parallelizing the work over multiple files and when iterating over non-default XML tree elements. The order of entries is not preserved in the output. (In collaboration with J. Klein)

New module pyteomics.peff implements the IndexedPEFF parser for protein databases in the new PSI standard format, PEFF. (Contributed by J. Klein)

New module pyteomics.traml implements the TraML parser for the PSI standard format for SRM data, TraML. (In collaboration with J. Klein)

pyteomics.protxml.ProtXML now also supports indexing and multiprocessing.

Removed parameter skip_empty_cvparam_values in XML parsers. In cvParam elements, missing “value” attribute is now always equivalent to the case when it is equal to an empty string. This affects the structure of items produced by MzML and MzIdentML parsers.

Multiple fixes and improvements.

3.5.1¶

Technical release to update the package metadata on PyPI. Project documentation on pythonhosted.org has been deleted. Latest documentation is available at: https://pyteomics.readthedocs.io/.

3.5¶

Preserve accession information on cvParam elements in mzML parser. Dictionaries produced by the parser can now be queried by accession using pyteomics.auxiliary.cvquery(). (Contributed by J. Klein)

Add optional decode_binary argument in pyteomics.mzml.MzML and pyteomics.mzxml.MzXML. When set to False, the parsers provide binary records suitable for decoding on demand. (Contributed by J. Klein)

Add method write_byte_offsets() in pyteomics.mzml.MzML, pyteomics.mzxml.MzXML and pyteomics.mzid.MzIdentML. Byte offsets can be loaded later to speed up random access. (Contributed by J. Klein)

Random access to MGF spectrum entries.

Add function pyteomics.mgf.get_spectrum().

Add class pyteomics.mgf.MGF. mgf.read() is now an alias to the class. The class can be used for indexing using spectrum titles.

This functionality will be changed in upcoming versions.

New module pyteomics.protxml for parsing of ProteinProphet output files.

Add PeptideProphet and iProphet analysis information to the output of pyteomics.pepxml.DataFrame().

New parameter huge_tree in XML parser constructors and read() functions. It is passed to the underlying lxml calls. Default value is False. Set to True to overcome errors such as: XMLSyntaxError: xmlSAX2Characters: huge text node.

New parameter skip_empty_cvparam_values in XML parser constructors. It instructs the parser to treat the empty “value” attributes in cvParam elements as if they were not there. This is helpful in cases when such empty “values” are present in one vendor’s file and absent in another: enabling the parameter will result in more unified output. Default value is False.

Change the default value for read_schema to False in XML parsing modules.

Change the default value for retrieve_refs to True in MzIdentML constructor.

Implement retrieve_refs for pyteomics.mzml.MzML. (Contributed by J. Klein)

New parameter keep_cterm in decoy generation functions in pyteomics.fasta.

New parameters decoy_prefix and decoy_suffix in all format-specific FDR filtering functions. If the standard is_decoy() function works for your files, you can use these parameters to specify either the prefix or the suffix appended to the protein names in decoy entries.

New ion types in pyteomics.mass.std_ion_comp.

Bugfixes.

3.4.2¶

New module pyteomics.ms1 for parsing of MS1 files.

mass.Composition constructor now accepts ion_type and charge parameters.

New functions pyteomics.mzid.DataFrame() and pyteomics.mzid.filter_df(). Their behavior may be refined later on.

Changes in behavior of pyteomics.auxiliary.filter() and pyteomics.auxiliary.qvalues():

both functions now always return DataFrames with pandas.DataFrame input and full_output=True.

string values of key, is_decoy and pep are substituted with simple itemgetter functions for non-pandas, non-numpy input;

additional parameters score_label, decoy_label, pep_label, and q_label for output control.

Performance optimizations in XML parsing code.

3.4.1¶

Add selenocysteine (“U”) and pyrrolysine (“O”) to pyteomics.mass.std_aa_mass and pyteomics.mass.std_aa_comp.

An optional parameter encoding is now accepted by text file readers (pyteomics.mgf.read() and pyteomics.fasta.read()). This can be useful for MGF files with non-ASCII spectrum titles or comments.

New function pyteomics.mass.mass.isotopologues().

Performance improvements in pyteomics.electrochem.pI().

Fix the issue in pyteomics.xml which resulted in very long processing times for indexed XML files with a byte ordering mark (BOM).

Support all standard and non-standard data array names in pyteomics.mzml.

Change default value of retrieve_refs in pyteomics.mzid.read() to True.

Preserve unit information extracted from cvParam tags in PSI XML files.

Fix in pyteomics.mzxml, other minor fixes.

3.4¶

New module pyteomics.mzxml for parsing of MzXML files.

New parameter dtype in pyteomics.mgf.read(), pyteomics.mzml.read() and pyteomics.mzxml.read() allows changing the dtype of arrays yielded by the parsers.

pyteomics.featurexml moved into a subpackage pyteomics.openms.

New module pyteomics.openms.trafoxml for OpenMS transformation files.

Bugfix in XML indexing code to make it work on Python 3.x versions prior to 3.5.

Bugfix in pyteomics.pylab_aux.scatter_trend() (support for lists and other non-ndarrays).

Performance improvements in pyteomics.achrom calibration functions.

3.3.1¶

New submodule pyteomics.featurexml with a parser for OpenMS featureXML files.

3.3¶

mzML and mzIdentML parsers can now create an index of element offsets. This allows quick random access to elements by unique ID.

mzML parsers now come in two flavors: pyteomics.mzml.MzML and pyteomics.mzml.PreIndexedMzML. The latter uses the byte offsets listed at the end of the file.

New parameters convert_arrays and read_charges in mgf.read() allow using it without numpy and possibly improve performance. The default behavior is retained.

Performance optimizations in mgf.read() and parser.cleave().

New decoy generation mode called “fused decoy”, described in the paper accepted to JASMS.

API changes¶

pyteomics.parser.cleave() no longer accepts the labels argument. It is emphasized that the input sequences are expected to be in plain one-letter notation, but no checks are performed.

DataFrame() functions in pepxml and tandem now extract more protein-related information. The list-like protein-related values can be reported as lists or packed into strings, depending on the optional paramter sep. Some column names have changed as a result.

Call signatures of pyteomics.fasta.decoy_sequence() and the functions using it are slightly changed. Standard modes are now also exposed as individual functions.

3.2¶

New submodule pyteomics.mass.unimod contains rewritten machinery for handling of Unimod relational databases (contributed by Joshua Klein). This is a substitution and extension for the old mass.Unimod class. pyteomics.mass.unimod requires SQLAlchemy.

Other changes:

New function pyteomics.auxiliary.linear_regression_perpendicular() provides a linear fit minimizing distances from data points to the fit line (as opposed to pyteomics.auxiliary.linear_regression(), which minimizes vertical distances).

Both new and old linear regression functions now accept a single array of shape (N, 2).

pyteomics.pylab_aux.scatter_trend() now has an optional parameter regression which can be a callable performing the regression. Also, the regression equation is now the label of the regression line, not the scatter plot.

Another two new parameters for pyteomics.pylab_aux.scatter_trend() are sigma_kwargs and sigma_values.

pyteomics.pylab_aux functions plot_line() and scatter_trend() now return the objects they create.

Writer functions (pyteomics.mgf.write(), pyteomics.fasta.write(), pyteomics.fasta.write_decoy_db()) now accept a file_mode argument that overrides the mode in which the file is opened.

In pyteomics.mgf.write() one can now override the format spec for fragment m/z, intensity and charge values using the optinal fragment_format argument. Key order and key-value parameter formatters are now also handled via optional arguments.

pyteomics.fasta.decoy_db() now supports ignore_comments and parser arguments.

3.1.1¶

Bugfix in pyteomics.auxiliary.

New parameter show_legend in pyteomics.pylab_aux.scatter_trend().

Performance improvements in pyteomics.parser.

3.1¶

This release offers integration with the great pandas library. Working with qvalues() and filter() functions is now much easier if you have your PSMs in a DataFrame. Many search engines use CSV as their output format, allowing direct creation of DataFrame objects. New functions pyteomics.tandem.DataFrame() and pyteomics.pepxml.DataFrame() faciliatate creation of DataFrames from corresponding formats.

Also, qvalues(), filter() and fdr() functions can now use posterior error probabilities (PEPs) instead of using decoys for q-value calculation.

In qvalues() and filter() functions, key and is_decoy can now be array-like objects or strings (as well as functions and iterators). If a string is given, it is used as a field name in the PSM array or DataFrame. fdr() functions also support strings and iterables as arguments.

New parameter pep in qvalues(), filter() and fdr() functions. It can be callable, array-like, or iterator. Conflicts with decoy-related parameters. Compatible with key, but makes it optional.

Fixed the behavior of filter.chain() functions. They now treat the full_output argument the same way as filter() functions.

Fixed the issue that caused exceptions when calling fasta.decoy_db() and fasta.write_decoy_db() with explicitly given mode (signature for creation of pyteomics.auxiliary.FileReader objects slightly changed).

Pyteomics now uses setuptools and is a namespace package.

Minor fixes.

API changes¶

Default value of remove_decoy in qvalues() is now False.

3.0.1¶

Added legend_kwargs as a keyword argument to pyteomics.pylab_aux.scatter_trend().

Minor fixes.

3.0.0¶

XML parsers are now implemented as objects, each format has its own class. Those classes can be instantiated using the same arguments as read() functions accepted, and support direct iteration and the with syntax. The read() functions are now simple aliases to the corresponding constructors.

As a result, functions iterfind(), version_info() and get_by_id() functions are now deprecated in favor of methods iterfind() and get_by_id() and attribute version_info of corresponding instances.

In pyteomics.mgf.write(), the order of keys and the format of values are now controlled via module-level variables.

In pyteomics.electrochem, correction for pK of terminal groups depending on the terminal residue is implemented; example set of pK and corrected pK added.

Imports of external dependencies are delayed where possible, so that unnecessary ImportErrors do not occur.

local_fdr() renamed to qvalues() in pepxml, mzid, tandem and auxiliary. local_fdr() did not reflect the semantics of the function. The algorithm has been also corrected so that the array of q-values is always sorted (as it should be by definition).

qvalues() now also accepts a parameter full_output which keeps the PSMs alongside their scores and associated q-values.

All fdr(), qvalues(), and filter() functions now accept a new parameter correction. It is used for more accurate estimation of the number of false positives using TDA (paper with explanation).

filter() functions now support both iterator protocol and context manager protocol. They now also accept the full_output parameter, which has the following meaning: if True (default), then an array of PSMs is directly returned by the function. Otherwise, an iterator is returned, as before. The array takes some memory, but this way is usually around 2x faster.

New function pyteomics.pylab_aux.plot_qvalue_curve().

pyteomics.mass.Composition objects now have a mass() method (equivalent to pyteomics.mass.calculate_mass().

Also, Composition and objects returned by pyteomics.parser.amino_acid_composition() now inherit from collections.defaultdict and collections.Counter.

Decoy-related functions in pyteomics.fasta now accept a new parameter keep_nterm that preserves the N-terminal residue in the generated decoy sequences.

Minor fixes.

API changes¶

In pyteomics.pylab_aux.scatter_trend(), keyword arguments for pylab.scatter() and pylab.plot() are now accepted as dicts scatter_kwargs and plot_kwargs. Keyword argument alpha is now not accepted and should be put in the appropriate dict.

In pyteomics.pylab_aux.plot_function_3d() and pyteomics.pylab_aux.plot_function_contour(), arbitrary kwargs can now also be passed to the plotting function.

filter() functions do not support context manager protocol by default. To keep using them as iterators / context managers, specify full_output=False (see above for details).

2.5.5¶

Fix for a memory leak in pyteomics.mzid.get_by_id(), which affects pyteomics.mzid.read() with retrieve_refs=True.

2.5.4¶

New functions local_fdr() in pepxml, mzid, and tandem. The function returns a NumPy array with PSM scores and corresponding values of local FDR.

New parameter iterative in read() functions of XML parsing modules. Parsing of mzIdentML files with retrieve_refs=True got significantly faster.

2.5.3¶

Universally applicable modifications are now allowed in pyteomics.parser.isoforms().

It is now also possible to specify non-terminal modifications which are only applicable to terminal residues.

Fix in pyteomics.parser.parse(): if the labels argument is provided, it needs to contain standard terminal groups if they are present in the sequence or if show_unmodified_termini is set to True.

pyteomics.mass.Composition instances are now pickleable.

Performance improvements.

2.5.2¶

New parameter reverse in all filter() functions.

New function pyteomics.mass.fast_mass2(), which is analogous to pyteomicsmass.fast_mass(), but supports full modX notation and is several times slower.

Fix in pyteomics.pepxml.read() for compatibility with files produced with Mascot2XML utility.

Unknown labels now allowed in pyteomics.electrochem and pyteomics.achrom functions in accordance with new general policy.

2.5.1¶

Bugfixes in pyteomics.parser.isoforms():

handling of the labels argument is now in accordance with new policy

solved memory problems when using max_mods

pyteomics.parser.cleave() does not require a valid modX sequence by default.

2.5.0¶

pyteomics.parser.amino_acid_composition() now accepts “split” parsed sequences.

Cleavage rules in pyteomics.parser.expasy_rules updated.

Helper function pyteomics.parser.num_sites() counts the number of cleavage sites in a sequence.

Helper function pyteomics.parser.match_modX() does essentially the same as pyteomics.parser.is_modX(), but returns a re.match object or None instead of a bool.

Bugfix in pyteomics.auxiliary.filter(), which didn’t work correctly with iterators.

Added a new parameter max_mods in pyteomics.parser.isoforms().

API changes¶

The boolean overlap parameter in pyteomics.parser.cleave() is replaced with an integer min_length. Since min_length uses pyteomics.parser.length(), the labels keyword argument is now accepted by cleave() and num_sites(), if needed. With carefully designed cleavage rules, all cleavage functions work with modX sequences.

The labels argument in pyteomics.parser.parse() and related functions has changed its meaning. parse() won’t raise an exception for non-standard labels in sequences if the labels keyword argument is not given.

The modX notation specification is now more strict to avoid ambiguity: only zero or two terminal groups can be present in a modX sequence. Sequences with one terminal group specified will be supported where possible, but be advised that sequences such as “H-OH” are intrinsically ambiguous.

2.4.3¶

Added the ratio keyword argument for FDR calculation.

Minor changes in iterfind() functions of file parsers.

Bugfix in pyteomics.mgf.write() (duplication of pepmass key).

Removed non-functional parameter read_schema for pyteomics.tandem.read().

2.4.2¶

Bugfix in pyteomics.mass.most_probable_isotopic_composition(). The bug manifested itself after version 2.4.0, when pyteomics.mass.nist_mass was expanded. Also, the format of the returned value is now in accordance with the documentation.

2.4.1¶

New function pyteomics.auxiliary.filter() for filtering lists of PSMs not coming directly from files in supported formats.

Also, a format-agnostic helper function pyteomics.auxiliary.fdr().

2.4.0¶

New functions for filtering to a certain FDR level based on target-decoy strategy, as well as for FDR estimation, in pyteomics.tandem, pyteomics.pepxml and pyteomics.mzid. The functions are called filter() (beware of shadowing the built-in function) and fdr() (in each of the modules). Chained versions filter.chain() and filter.chain.from_iterable() are also available. See Data Access for more info.

New function pyteomics.parser.coverage() for sequence coverage calculation.

New function pyteomics.fasta.decoy_chain(), a chained version of pyteomics.fasta.decoy_db().

New elements in pyteomics.mass.nist_mass. Pretty much all elements are there now.

Fix in pyteomics.parser.parse() to cover some fancy corner cases.

Bugfix in pyteomics.tandem: modification info is now fully extracted.

pyteomics.mass.isotopic_composition_abundance() is now able to calculate abundances for larger molecules.

Note

Rounding errors may be significant in this case.

2.3.0¶

New parameter “read_schema” in read() functions of XML parsing modules. When set to False, disables the attempts to fetch an auxiliary file and obtain structure information about the file being parsed.

New function chain() in all modules that have a read() function, for convenient chaining of multiple files. chain() only works as a context manager. Use itertools.chain() in other cases. The chain.from_iterable form is also available as a context manager.

New function pyteomics.auxiliary.print_tree() for exploration of complex nested dicts produced by XML parsers.

New sets of retention coefficients in pyteomics.achrom.

Bugfix in pyteomics.pepxml. The bug caused an exception when parsing some pepXML files.

The output of pyteomics.mgf.read() now always contains a masked array of charges.

Other minor fixes.

API change¶

In pyteomics.mgf.read() the precursor charge is now always represented by a list of ints (a ChargeList object).

2.2.2¶

Bugfix in pyteomics.tandem. The info about all proteins is now extracted.

2.2.1¶

Update parsers for FASTA headers.

NamedTuple for FASTA entries is now defined globally, which should solve pickling problems.

2.2.0¶

New module pyteomics.tandem for reading output files of X!Tandem search engine.

2.1.6¶

Fix in pyteomics.pepxml. pepXML files generated by TPP are now processed without errors.

2.1.5¶

Fix in pyteomics.pepxml. ‘modified_peptide’ is now always available.

Fix in pyteomics.mass (issue #2 in the bug tracker).

Improved arithmetics for Composition objects.

2.1.4¶

In fasta, decoy_db() now doesn’t write to file, but returns an iterator over FASTA records. The old decoy_db() is now called write_decoy_db(), which is equivalent to decoy_db() combined with write().

Bugfixes:

In pyteomics.mgf.read(), the charges, if present, are returned as a masked array now. Previously, an exception occurred if charges were missing for some of the fragments.

Values in mass.nist_mass corrected.

Other minor corrections.

2.1.3¶

Adjust the behavior affected by the bug fixed in 2.1.2. name attributes of <cvParam> elements in the absence of value attributes are now collected in a list under the ‘name’ key.

Add support for overlapping matches in parser.cleave().

2.1.2¶

Bugfix in XML parsers. The bug caused the mzML parser to break on some files. The fix can slightly change the format of the output.

2.1.1¶

Rename keys in the dicts returned by mgf.read() to facilitate writing code working with both MGF and mzML.

The items yielded by fasta.read() now have attributes description and sequence.

2.1.0¶

New sets of retention coefficients in achrom.

mass.Composition now only stores non-zero ints.

fasta now has tools for parsing of FASTA headers.

File parsers now implement the context manager protocol. We recommend using with statements to avoid resource leaks.

API changes¶

‘pepmass’ is now a tuple in the output of mgf.read() (to allow reading precursor intensities).

new function fasta.parse() for convenient parsing of FASTA headers.

fasta.std_parsers stores parsers for common UniProt header formats.

new parameter parser in fasta.read() allows to apply parsing while reading a FASTA file.

close parameter removed in all functions that do file I/O. The unified behavior is: if the parameter is a file object, it won’t be closed by the function. If a file path is given, the file object will be created and closed inside the corresponding function.

2.0.3¶

Added new class pyteomics.mass.Unimod. The interface is experimental and may change.

Improved iterfind() function in XML-reading modules.

pyteomics.mass.Composition objects now support multiplication by int.

Bugfix in auxiliary.linear_regression().

2.0.2¶

Added new function iterfind() in pyteomics.mzid, pyteomics.pepxml and pyteomics.mzml.

2.0.1¶

API changes¶

pyteomics.parser.peptide_length() is renamed to pyteomics.parser.length().

2.0.0¶

Added mzid module for parsing of mzIdentML files.

Fixed bugs, improved tests.

API changes¶

top-module functions in fasta, mgf, mzml, pepxml, as well as mzid, are now called read().

in parser, parse_sequence() renamed to parse(). It now accepts an optional parameter allow_unknown_modifications.

mgf.write_mgf() and fasta.write_fasta() renamed to write().

the output format of all read() functions has changed.

1.2.5¶

Include Apache license version 2.0: http://www.opensource.org/licenses/Apache-2.0

Minor bugfix in pyteomics.fasta.

1.2.4¶

Changes in pyteomics.mass.

API changes¶

Composition objects can be created using positional first argument, which will be treated as a sequence or (upon failure) as a formula. This means that all functions relying on Composition (calculate_mass(), most_probable_isotopic_composition(), isotopic_composition_abundance()) allow that as well. However, it’s of no use for the latter.

Composition entries for modifications can be added to aa_comp and used in composition and mass calculations. This way the specified group will be added to any residue bearing this modification.

That being said, the add_modifications() function is not needed anymore and has been removed.

Addition and subtraction of Composition objects now produces a Composition object, allowing addition/subtraction of multiple objects.

Composition is now a subclass of collections.defaultdict so one can safely retrieve values without checking if a key exists.

1.2.3¶

pyteomics.parser.isoforms() now allows terminal modifications.

Bugfixes in pyteomics.parser.parse_sequence().

New function pyteomics.parser.tostring() converts parsed sequences to strings.

Helper function pyteomics.parser.is_modX() added to check modX labels.

API changes¶

pyteomics.parser.isoforms() now returns a generator object

1.2.2¶

Bugfix in pyteomics.pepxml: modification info is now extracted.

New optional boolean argument ‘split’ in pyteomics.parser.parse_sequence() allows to generate a list of tuples where modifications are separated from the residues instead of a regular list of labels. In labels not only modX labels are now allowed, but also separate mod prefixes. Such modifications are assumed to be applicable to any residue.

1.2.1¶

Memory usage significantly decreased when parsing large mzML and pepXML files.

1.2.0¶

Added support for Python 3. Python 2.7 is still supported, Python 2.6 is not.

1.1.1¶

New function called add_modifications() added in pyteomics.mass. It updates aa_comp.

Also, pyteomics.parser.isoforms() is a new function to get all possible modified sequences of a peptide.

1.1.0¶

New module added - pyteomics.mgf. It is intended for reading and writing files in Mascot Generic Format.

1.0.2¶

In pyteomics.pepxml module, now all search hits are read from file (not only the top hit).

API changes:¶

pyteomics.pepxml.read(): information specific to search hits is now stored in a list under the 'search_hits' key. The list is sorted by hit rank.

1.0.1¶

Fix compatibility issues in pyteomics.pepxml module.

1.0.0¶

The first public release of Pyteomics.

API changes:¶

pyteomics.achrom: rename 'length correction factor' to 'length correction parameter'.

pyteomics.achrom.get_RCs_vary_lcf() was renamed to pyteomics.achrom.get_RCs_vary_lcp().

length_correction_factor keyword argument of pyteomics.achrom.get_RCs() was renamed to lcp.

Pyteomics documentation v5.0