ProteinHistorian provides pre-computed ages for many species based on several external databses of protein families and two ancestral family reconstruction algorithms. Please read the paper and methods page for more information about the data and algorithms behind ProteinHistorian's predictions. This page provides answers to questions we are frequently asked by users of ProteinHistorian.
If your question is not addressed here, please contact us at .
Questions
- What if my protein IDs are not in the right format for ProteinHistorian?
If your protein names are not in one of the supported formats, we recommend the Uniprot ID Mapping tool, the DAVID ID Conversion tool, or BioMart. Many organism-specific web databases, such as SGD and MGI also provide ID mapping resources. - Why aren't ages found for some of my input proteins?
Unfortunately, protein ID sets are regularly changed and updated. Our age estimates are tied to specific databases of protein families and the IDs that they use. We have attempted to make these more general, but it is very difficult to support all of the evolving ID sets that are currently in use. You can see all proteins for which we have ages in a database by downloading the age databases themselves. You can also set the way that ProteinHistorian handles input proteins without ages under the "Additional Options" in Box 2 on the main page. By default, proteins without ages are not considered in the analysis and a warning is given on the results page. - What does a protein's age mean exactly?
The meaning of protein "age" depends on the species tree, family database, and ancestral family reconstruction algorithm used. Most definitions of age rely on the proteins having recognizable sequence similarity, which is very likely to imply shared evolutionary origin. However, the specifics of this similarity differ among different family databases; e.g., the Pfam databases consider similarity over individual functional domains, while the PPOD databases consider the entire protein sequence.
In general, the age can be thought of as the branch in the species tree on which a currently recognizable shared sequence element first appeared. The associated divergence time (in millions of years ago) is an estimate for the branch of origin taken from the TimeTree database. Given the complex evolutionary histories of proteins, different age estimation strategies may produce different ages for the same proteins. The section "The Estimation and Interpretation of Protein Age" in the paper discusses these issues in more detail. Supplementary Figures S1, S2, and S3 directly compare protein age distributions produced by different databases and ancestral family reconstruction algorithms. - Can I use my own species tree or protein family database?
Yes! If you have a species tree or database of evolutionary relationships that you would like to use in your protein age analysis, download the command line version of ProteinHistorian and see the README for information about formatting the database. If you would like to make the database publicly available as a part of ProteinHistorian, please contact us. - How should I cite ProteinHistorian?
ProteinHistorian is described in the following paper:Capra JA, Williams AG, and Pollard KS. ProteinHistorian: Tools for the Comparative Analysis of Eukaryote Protein Origin. PLoS Computational Biology, In Press, 2012.
Return to ProteinHistorian submission form.