OpenMS
Loading...
Searching...
No Matches
ConsensusID

Computes a consensus from results of multiple peptide identification engines.

potential predecessor tools → ConsensusID → potential successor tools
IDPosteriorErrorProbability PeptideIndexer
IDFilter
IDMapper

Reference:

Nahnsen et al.: Probabilistic consensus scoring improves tandem mass spectrometry peptide identification (J. Proteome Res., 2011, PMID: 21644507).

Algorithms:

ConsensusID offers several algorithms that can aggregate results from multiple peptide identification engines ("search engines") into consensus identifications - typically one per MS2 spectrum. This works especially well for search engines that provide more than one peptide hit per spectrum, i.e. that report not just the best hit, but also a list of runner-up candidates with corresponding scores.

The available algorithms are (see also OpenMS::ConsensusIDAlgorithm and its subclasses):

  • PEPMatrix: Scoring based on posterior error probabilities (PEPs) and peptide sequence similarities. This algorithm uses a substitution matrix to score the similarity of sequences not listed by all search engines. It requires PEPs as the scores for all peptide hits.
  • PEPIons: Scoring based on posterior error probabilities (PEPs) and fragment ion similarities ("shared peak count"). This algorithm, too, requires PEPs as scores.
  • best: For each peptide ID, this uses the best score of any search engine as the consensus score. All peptide IDs must have the same score type.
  • worst: For each peptide ID, this uses the worst score of any search engine as the consensus score. All peptide IDs must have the same score type.
  • average: For each peptide ID, this uses the average score of all search engines as the consensus score. Again, all peptide IDs must have the same score type.
  • ranks: Calculates a consensus score based on the ranks of peptide IDs in the results of different search engines. The final score is in the range (0, 1], with 1 being the best score. The input peptide IDs do not need to have the same score type.

PEPs for search results can be calculated using the IDPosteriorErrorProbability tool, which supports a variety of search engines.

Note
Important: All protein-level identification results will be lost by applying ConsensusID. (It is unclear how potentially conflicting protein-level results from different search engines should be combined.) If necessary, run the PeptideIndexer tool to add protein references for peptides again.
Peptides with different post-translational modifications (PTMs), or with different site localizations of the same PTMs, are treated as different peptides by all algorithms. However, a qualification applies for the PEPMatrix algorithm: The similarity scoring method used there can only take unmodified peptide sequences into account, so PTMs are ignored during that step. However, the PTMs are not removed from the peptides, and there will be separate results for differently-modified peptides.

File types:

Different input files types are supported:

  • idXML: A file containing multiple identification runs, typically from different search engines. Use IDMerger to merge individual idXML files from different search runs into one. During the ConsensusID analysis, the identification results will be grouped according to their originating MS2 spectra, based on retention time and precursor m/z information (see parameters rt_delta and mz_delta). One consensus identification will be generated for each group. With the per_spectrum flag you can also input multiple idXML files. A consensus will be made per combination of originating mzml file and spectrum_ref.
  • featureXML or consensusXML: Given (consensus) features annotated with peptide identifications from multiple search runs, one consensus identification is created for every annotated feature. Peptide identifications not assigned to features are not considered and will be removed. See IDMapper for the task of mapping peptide identifications to feature maps or consensus maps.
Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

Filtering:

Generally, search results can be filtered according to various criteria using IDFilter before (or after) applying this tool. ConsensusID itself offers only a limited number of filtering options that are especially useful in its context (see the filter parameter section):

  • considered_hits: Limits the number of alternative peptide hits considered per spectrum/feature for each identification run. This helps to reduce runtime, especially for the PEPMatrix and PEPIons algorithms, which involve costly "all vs. all" comparisons of peptide hits.
  • min_support: This allows filtering of peptide hits based on agreement between search engines. Every peptide sequence in the analysis has been identified by at least one search run. This parameter defines which fraction (between 0 and 1) of the remaining search runs must "support" a peptide identification that should be kept. The meaning of "support" differs slightly between algorithms: For best, worst, average and rank, each search run supports peptides that it has also identified among its top considered_hits candidates. So min_support simply gives the fraction of additional search engines that must have identified a peptide. (For example, if there are three search runs, and only peptides identified by at least two of them should be kept, set min_support to 0.5.) For the similarity-based algorithms PEPMatrix and PEPIons, the "support" for a peptide is the average similarity of the most-similar peptide from each (other) search run. (In the context of the JPR publication, this is the average of the similarity scores used in the consensus score calculation for a peptide.)
  • count_empty: Typically not all search engines will provide results for all searched MS2 spectra. This parameter determines whether search runs that provided no results should be counted in the "support" calculation; by default, they are ignored.

The command line parameters of this tool are:

INI file documentation of this tool: