We use the set of manually curated reference rice reactions to electronically infer reactions in several evolutionarily divergent plant species for which high-quality whole-genome sequence data are available the Gramene database and a select set of published transcriptomes and non-reference plant genomes, and hence a comprehensive and high-quality set of protein predictions exists. The estimated success rates of our orthology inference strategy can be stated as ‘the percentage of eligible reactions, defined in step 2 below, in the current reference data set for which an event can be inferred to be projected in another species. By this measure, success rates range from species to species, depending on the quality of the primary annotation and genes identified by the genome or transcriptome sequencing project.
Electronic inference proceeds in four steps.
1) Protein homology data were obtained (a) from Gramene’s Plant Compara. Briefly, this method is based on the construction of gene trees, using the longest protein translation for every Ensembl gene, for all species included in the Compara database. Homologues are deduced from these trees. The method is described in more detail in EnsemblCompara GeneTrees: Analysis of complete, duplication aware phylogenetic trees in vertebrates. Vilella et al., Genome Research, 2008; (2) A select set of species data represented by the published/shared in collaboration with us in the form of transcriptome and/or the genome annotation were process through the InParanoid-based homology prediction. For the purpose of inferring homologous events in Reactome, we used both the Core Plant Compara data set and the Inparanoid based predictions for projecting the computationally inferred Pathway events in several plant genomes. More information about the analyses and inclusion of two types of homology data set can be found here.
2) All reference reactions in the Plant Reactome knowledgebase involving one or more proteins are eligible for electronic inference. Eligible reactions are checked to determine whether each involved protein has at least one homologous protein (HP) in the selected plant genomes. If a reference reaction involves a complex, at least 75% of the accessioned protein components of the complex must have HPs in the selected species.
3) For each reaction that meets these criteria, an equivalent reaction is created for the selected species by replacing each reference protein with its organism specific HP. If a reference protein corresponds to more than one HP from a species, a DefinedSet called ‘Homologues of …’ is created, with the model organism HPs as members. For reference proteins that lack a species specific HP but that are included in complexes inferred due to the 75% threshold rule, placeholder model organism entities (called ‘Ghost homologue of…’) are created.
4) If this analysis generates reactions in the selected species corresponding to any of the steps of a reference pathway, then the pathway event is also inferred for the selected species.
These electronically inferred reactions are predictions based on a number of assumptions. Most basically, we assume that if we can find HPs corresponding to all proteins involved in a reference reaction, then the proteins mediate the same reaction in the projections. This may not be true. On the other hand we may miss a truly homologous reaction in a given species because it is mediated either by structurally divergent proteins and the Compara strategy failed to identify them or the gene was not identified by the genome annotation project. Similarly, complexes sharing less than 75% homologous subunits between species may nevertheless continue to perform the same function. The electronically inferred reactions presented in Plant Reactome are thus not data, but hypotheses useful to direct the design of confirmatory experiments.
If you are interested in looking at the pathway projection summary, please visit the Database Release Summary page.