Edited by: John Hancock, University of Cambridge, UK
Reviewed by: Rafael D. Mesquita, Universidade Federal do Rio de Janeiro, Brazil; Dong Xu, Idaho State University, USA
*Correspondence: Cathy H. Wu, Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA. e-mail:
This article was submitted to Frontiers in Bioinformatics and Computational Biology, a specialty of Frontiers in Genetics.
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.
As a member of the Open Biomedical Ontologies (OBO) foundry, the Protein Ontology (PRO) provides an ontological representation of protein forms and complexes and their relationships. Annotations in PRO can be assigned to individual protein forms and complexes, each distinguishable down to the level of post-translational modification, thereby allowing for a more precise depiction of protein function than is possible with annotations to the gene as a whole. Moreover, PRO is fully interoperable with other OBO ontologies and integrates knowledge from other protein-centric resources such as UniProt and Reactome. Here we demonstrate the value of the PRO framework in the investigation of the spindle checkpoint, a highly conserved biological process that relies extensively on protein modification and protein complex formation. The spindle checkpoint maintains genomic integrity by monitoring the attachment of chromosomes to spindle microtubules and delaying cell cycle progression until the spindle is fully assembled. Using PRO in conjunction with other bioinformatics tools, we explored the cross-species conservation of spindle checkpoint proteins, including phosphorylated forms and complexes; studied the impact of phosphorylation on spindle checkpoint function; and examined the interactions of spindle checkpoint proteins with the kinetochore, the site of checkpoint activation. Our approach can be generalized to any biological process of interest.
Understanding the meaning of data is essential for accurate scientific analysis and interpretation. Ontologies formalize the meaning of terms using a defined vocabulary that facilitates the integration of data and knowledge (Gkoutos et al.,
Within the Foundry, the Protein Ontology (PRO
Protein Ontology terms are labeled with categories to reflect their position in the PRO hierarchy. These categories are: (i) family: protein products of a distinct gene family arising from a common ancestor; (ii) gene: the protein products of a distinct gene; (iii) sequence: protein products that have a distinct sequence upon initial translation; and (iv) modification: protein products derived from a single mRNA species that differ because of some change (or lack thereof) that occurs after the initiation of translation (co- or post-translational; Natale et al.,
To facilitate reliable communication and management of data, PRO is organized under the umbrella of the Basic Formal Ontology (BFO), a top-level formal foundational ontology in the biomedical domain. BFO represents, in consistent fashion, the upper level categories common to ontologies developed in different domains and at different levels of granularity. It adopts a view of reality as comprising (1) continuants: entities that continue or persist through time (objects, qualities, and functions), and (2) occurrents: the events or happenings in which continuants participate
Moreover, PRO interoperates seamlessly with other OBO ontologies by reusing terms whenever the classes needed already exist in other ontologies. This is the case for the protein complex terms found in the Cellular Component branch of the Gene Ontology (GO; Ashburner et al.,
In addition, PRO leverages and cross references data in existing protein-centric informatics resources. For example, UniProtKB (Bult et al.,
The formal definition of protein forms and complexes at various levels of granularity in the PRO framework provides a means to associate annotations to the most appropriate class, as opposed to the traditional gene-level-only association. This is especially useful, for example, in cases where functions are realized by protein complexes rather than their individual components, or by specific isoforms of a protein, or by a protein modified form. Class-specific annotations are stored in PRO using controlled vocabularies and are integrated in the PRO website so they can be searched. Therefore, the PRO framework, along with the annotation and the mapping to relevant bioinformatics resources help to answer biologically important questions, such as: (1) What proteins and complexes are involved in a particular process? (2) What proteins and complexes are conserved in a given set of species? and (3) What function(s) is associated with a given protein form or complex?
To be able to answer the questions described in the previous section, PRO has to provide an adequate coverage of terms and annotations that pertain to the biological questions being asked. The ultimate goal in PRO is the representation of protein-related terms for the 12 GO Reference Genomes and human protein complexes from Reactome. Release 32.0 contains 35,196 PRO terms from which about 25,000 are ProEvo terms (family and gene-level classes), 9,500 are ProForm terms (isoforms and modified forms), and 393 are ProComp terms. In terms of annotations, there are 2,941 GO annotations derived from 1,242 publications. The distribution files
In this article we use the features of PRO, including a graphical representation of the PRO hierarchy, to explore the spindle checkpoint. The spindle checkpoint monitors interactions between kinetochores and spindle microtubules during mitosis and meiosis and inhibits the onset of anaphase until all kinetochores have made correct attachments to the spindle (Zich and Hardwick,
The spindle checkpoint represents a rich use case with features to demonstrate the application of all three sub-ontologies of PRO. First, it has been extensively studied in a range of organisms, and the core checkpoint proteins are conserved in eukaryotes from yeast to humans. Thus, using ProEvo as a guide to the evolutionary relationships amongst spindle checkpoint proteins, it is possible to make predictions about checkpoint proteins based on evidence concerning their counterparts in other organisms. The ProEvo representation can also highlight differences between spindle checkpoint proteins that may have implications for checkpoint function. Second, the spindle checkpoint is highly dependent on phosphorylation – of the seven core spindle checkpoint proteins in vertebrates, three (BUB1, AURKB, and TTK) are confirmed protein kinases and all seven are phosphoproteins (Oh et al.,
Information about spindle checkpoint protein forms and their functions was identified through curation of full-length articles that were returned in a PubMed search using the keywords “Bub1,” “BubR1,” and “Mad3” (BubR1 is a commonly used synonym for the checkpoint protein BUB1B and MAD3 is the closest yeast relative of BUB1B). Because of our interest in phosphorylation of checkpoint proteins, we focused our curation efforts on the subset of articles that were flagged by the text mining tool Rule-based LIterature Mining System for Protein Phosphorylation (RLIMS-P) as containing mentions of phosphorylation in the abstract (Yuan et al.,
All information on protein forms was entered into Rapid Annotation interfaCE for PRO (RACE-PRO
The RACE-PRO entries were checked by a PRO editor and converted to PRO terms using a semi-automated process, in which standard names and definitions for gene level and isoform level terms are automatically generated as are missing parent terms that are necessary to complete the PRO hierarchy. Definitions of modified protein forms and PRO terms for complexes and families were handled manually. The end result of the processing pipeline were OBO stanzas containing the term IDs, names, definitions, synonyms, categories, and relationships to other terms. Annotations were included in the PRO Annotation File (PAF). All terms and annotations generated in this study can be found in PRO release 32.
Once data was entered into the PRO framework, it was analyzed and visualized using the search and graphical display tools in the PRO website. The search functionality allows all parts of a PRO entry, including definition and annotation, to be searched. Query terms can be words or phrases or unique identifiers from other resources such as Pfam or GO. Searches can be restricted to a particular field of a PRO entry; for example, searching for the term “9606” in the Taxon ID field will retrieve all human protein terms. The search terms “NOT NULL” and “NULL” can be used to identify PRO entries that do or do not contain information in a selected field. Multiple search terms can be joined with the Boolean terms “AND,” “OR,” and “NOT” to carry out more complex searches. In addition, searches can be restricted to particular categories of PRO entries such as modified forms, disease-related forms, or complexes using the “Quick Links” menu provided on the PRO search page. Finally, the search result table can be customized to include/remove information and can be downloaded in tab-delimited format.
The PRO hierarchy can be visualized using a built-in tool based on Cytoscape Web (Lopes et al.,
The kinetochore protein–protein interaction (PPI) network was displayed using locally installed Cytoscape, version 2.8 (Smoot et al.,
Multiple sequence alignments were performed using ClustalW version 2.1 (Larkin et al.,
To get an overview of the extent of spindle checkpoint-related information contained within PRO we performed a search in PRO for terms containing the phrases “spindle checkpoint,” “spindle assembly checkpoint,” or “mitotic checkpoint.” The search returned 112 PRO terms. The PRO search query and the Cytoscape web view of the combined hierarchy of the search result terms are shown in Figure
The spindle checkpoint pathway is highly conserved throughout eukaryotes. Homologs of the core checkpoint proteins are present in organisms from yeast to humans and checkpoint mechanisms, such as MCC inhibition of the APC/C, are also conserved (Zich and Hardwick,
In PRO, ProEvo classes provide insight into the evolutionary relationships among proteins by grouping proteins that share full-length sequence similarity. Importantly, this higher level relationship based on a common domain organization can be searched in PRO, as terms in ProEvo are annotated with domain information from resources such as Pfam. Therefore, we searched PRO for proteins that contained the conserved N-terminal TPR domain found in all of the BUB-like proteins (PFAM:PF08311, MAD3/Bub1 homology domain I). The search returned two results: the MAD3 gene-level term (PR:000035499) and the BUB1/BUB1B family level term (PR:000035665).
To reveal the common and divergent attributes of these protein classes, the result table was customized, via the Display Option functionality, to display the corresponding annotations and allow their direct comparison
The combined Cytoscape web view for BUB1/BUB1B and MAD3 terms is shown in Figure
Phosphorylation is a major mechanism of regulation in the spindle checkpoint pathway and the interplay among the checkpoint-related phosphorylation events is complex (Zich and Hardwick,
To view the phosphorylated BUB1B protein forms in PRO, we searched for “bub1 beta” in the PRO Name field, restricting the search to phosphorylated forms using the Quick Links menu. Eleven search results were returned: four species-independent modification-level terms and seven species-specific terms. The combined Cytoscape web view of the four species-independent terms (PR:000035361, PR:000035427, PR:000035431, and PR:000035434) is shown in Figure
BUB1B/Phos:1 (PR:000035361), defined in PRO as a BUB1B form that has been phosphorylated on a site analogous to Thr-620 of human BUB1B, is found in humans (PR:000035362) and frogs (PR:000035426). The frog form is phosphorylated on Thr-605, which is considered to be analogous to human Thr-620 because it aligns with human Thr-620 in a multiple sequence alignment (Figure
Although BUB1B/Phos:1 has not as yet been characterized in mice, the equivalent phosphorylation site (Thr-613) is conserved in the mouse protein (Figure
BUB1B/Phos:2 (PR:000035427) contains the same CDK1 phosphorylation site (Thr-620 in humans) as BUB1B/Phos:1 and is additionally phosphorylated on several sites by PLK1/PLX1. Because experimental evidence indicates that PLK1 phosphorylation of BUB1B is low in the absence of prior CDK1 phosphorylation, PRO does not have a term for BUB1B phosphorylated by PLK1 alone (Elowe et al.,
The PLK1 phosphorylation sites in BUB1B/Phos:2 are a subject of ongoing investigation. The PRO entry page for the human BUB1B/Phos:2 (PR:000035428) documents two neighboring sites – Ser-676 and Thr-680 – that have been verified
One of the challenging aspects of the curation of PRO phosphorylated forms is determining whether a phosphorylated form that has been defined in one species also exists in other species. This challenge is exemplified by BUB1B/Phos:2. There is evidence that BUB1B/Phos:2 exists in both frogs and mice, although it has not been completely characterized in either organism. All of the human BUB1B/Phos:2 phosphorylation sites that have been confirmed
BUB1B/Phos:3 (PR:000035431) is phosphorylated on Thr-608 in humans (PR:000035432) and on the equivalent site, Thr-593 in frog (PR:000035433) (Figure
Finally, BUB1B/Phos:4 (PR:000035435), which has so far only been observed in humans (PR:000035435), is multiply phosphorylated by CDK1 on sites distinct from those phosphorylated in BUB1B/Phos:1 and BUB1B/Phos:2. Phosphorylation occurs
By combining the PRO representation of phosphorylated forms with multiple sequence alignments, we can predict not just individual phosphorylation sites, but combinations of phosphorylation sites that are likely to occur
In the presence of unattached or incorrectly attached kinetochores, the core spindle checkpoint proteins form multiple protein complexes that contribute to the inhibition of the APC/C and metaphase arrest (Zich and Hardwick,
The MCC is one of the best-characterized spindle checkpoint complexes, and consequently, it has been described in multiple bioinformatics resources, including GO and Reactome. The PRO record for the human MCC (PR:000035511), shown in Figure
The BUB1 protein plays a critical role in checkpoint signal generation. Together with BUB3, it localizes to kinetochores by binding to the kinetochore component CASC5 (KNL1/blinkin) and serves as a platform for the recruitment and activation of other checkpoint proteins, including MAD1 and BUB1B (Lara-Gonzalez et al.,
To view the PRO representation of BUB1-containing complexes, we searched for “BUB1” in any field and restricted the search results to complexes using the Quick Links menu. The search returned 16 results, including 11 BUB1 complexes (The other five complexes contained BUB1B rather than BUB1.). The combined Cytoscape web view of these 11 complexes and their components is shown in Figure
BUB1 and BUB3 appear together in three different complexes: BUB1:BUB3 (PR:000035566), BUB1:BUB3:MAD1L1 (PR:000035567), and BUB1:BUB3:APC (PR:000035576) [Note: APC is the short name for the adenomatous polyposis coli protein (APC); it is not the anaphase-promoting complex/cyclosome (APC/C).]. The BUB1:BUB3 complex is highly conserved, occurring in human, fission yeast, and budding yeast (orange squares). The BUB1 proteins from all three organisms have a common parent (the species-independent BUB1 term, PR:000004854), indicating that they are orthologous; similarly, the BUB3 proteins have the species-independent BUB3 term (PR:000004856) as a common parent. Given that orthologous BUB1:BUB3 complexes exist in distantly related organisms (humans and yeast) we expect that more examples of this complex will be added to PRO in the future as more of the spindle checkpoint literature is curated. The BUB1:BUB3:MAD1L1 complex, so far observed only in humans, forms at kinetochores during the process of checkpoint activation (Seeley et al.,
The remaining BUB1-containing complexes in Figure
The kinetochore, a complex, multi-protein structure organized around the centromeric DNA of each sister chromatid pair, is critically important as a staging area for the generation and amplification of spindle checkpoint signals (Lara-Gonzalez et al.,
To create a PPI network of kinetochore-localized proteins, we first identified all human kinetochore-localized protein forms in PRO by searching for terms with Taxon ID 9606 (human) and Ontology ID GO:0000776 (kinetochore). Although the kinetochore and centromere are distinct structures, the terms are sometimes used interchangeably in the literature; therefore, we also retrieved human centromere localized proteins by searching for human proteins (Taxon ID 9606) annotated with the GO term GO:0000779 (condensed chromosome, centromeric region). The searches returned 34 results, including 28 kinetochore-localized protein forms, 5 centromere localized forms, and one term – AURKB (PR:000035358) that is annotated with both kinetochore and centromere localization terms. These terms are annotated with PPI data mined from the literature and from several PPI databases. We downloaded the OBO stanzas and PAF for these proteins from PRO and used the information therein to build a network with Cytoscape (Figure
Because functional annotation of PRO terms is an ongoing process, the set of kinetochore/centromere localized proteins we retrieved is not comprehensive nor is the PRO annotation of PPIs for these proteins complete. However, it is representative of the diverse functions of the kinetochore. The core checkpoint proteins BUB1, BUB1B, BUB3, MAD1L1, and MAD2L1 (Figure
The checkpoint proteins are integrated into the larger environment of the kinetochore through interactions with other kinetochore/centromere proteins. AURKB binds to BIRC5, CDCA8, and INCENP (Figure
BUB1 and BUB1B both associate with the outer kinetochore component, CASC5. BUB1B makes other connections to the kinetochore via SGOL1 and CENPE, a protein that assists in the alignment of chromosomes on the metaphase plate [see PRO annotation for CENPE (PR:000035367)]. BUB1B binding to CENPE may stimulate its auto-phosphorylation activity [see BUB1B/Phos:3 (PR:000035432)].
Several spindle checkpoint proteins – BUB1, BUB1B, BUB3, and MAD2 – interact with APC and the checkpoint kinase TTK interacts with DVL2. APC and DVL2, which interact with each other, both participate in spindle assembly [see PRO annotation for APC (PR:000030190) and DVL2 (PR:000035487)]. The significance of these interactions is unclear, but it could reflect a role for APC and DVL2 in checkpoint signaling or a role for the checkpoint proteins in spindle assembly.
A protein kinase, PLK1, and a protein phosphatase, PP2A, associate with checkpoint proteins and other kinetochore proteins, positioning them to regulate critical kinetochore substrates. PLK1 interacts with the checkpoint proteins BUB1 and BUB1B as well as SGOL1 and DVL2. PLK1 association with BUB1 depends upon the prior phosphorylation of BUB1 (Qi et al.,
Eight of the kinetochore/centromere localized protein forms in our set are phosphorylated: BUB1B/Phos:2, BUB1B/Phos:3, BUB1B/Phos:4, AURKB/Phos:1, CDC20/Phos:1, ATM/Phos:2, H3T3ph, and HHTA1/Phos:2. Because phosphorylation can have a wide range of effects on proteins, affecting localization, function, and/or the processes in which they participate, we wanted to investigate the impact of phosphorylation on these particular proteins.
In the PRO annotation, localizations, functions, and processes that are affected by protein modification are denoted by adding a modifier (such as increased or decreased) to the corresponding GO term and inclusion of a reference form. Thus, we searched for human proteins (Taxon ID 9606) localized to the kinetochore (Ontology ID GO:0000776) with at least one line of functional annotation that included a modifier (Modifer NOT NULL); to limit the results to phosphorylated proteins, we selected “Phosphorylated forms” from the Quick Links menu. For the reasons described above, we repeated the search substituting GO:0000779 (condensed chromosome, centromeric region) in the Ontology ID field. All eight of the kinetochore/centromere localized proteins appeared in our search results, indicating that all of these proteins had at least one attribute that was affected by phosphorylation. We examined the annotation for each protein and summarized the affected attributes in Table
Protein | Modifier | Function/process | Targets |
---|---|---|---|
CDC20/Phos:1 | Decreased | Ubiquitin protein ligase activity | |
Increased | Spindle checkpoint | ||
BUB1B/Phos:2 | Increased | Protein binding | BUB1 |
Increased | Protein binding | PP2A | |
Increased | Protein kinase activity | ||
Increased | Attachment of spindle microtubules to kinetochores | ||
Increased | Metaphase plate congression | ||
BUB1B/Phos:3 | Increased | Metaphase plate congression | |
Increased | Chromosome segregation | ||
Increased | Spindle checkpoint | ||
Increased | Protein localization to kinetochore | MAD1L1, MAD2L1 | |
Decreased | Negative regulation of protein phosphorylation | NDC80 | |
BUB1B/Phos:4 | Increased | Attachment of spindle microtubules to kinetochores | |
Increased | Inhibition of mitotic anaphase-promoting complex activity | ||
Increased | Metaphase plate congression | ||
AURKB/Phos:1 | Increased | Protein kinase activity | |
Increased | Chromosome segregation | ||
Increased | Metaphase plate congression | ||
Increased | Spindle checkpoint | ||
ATM/Phos:2 | Increased | Protein kinase activity | |
Increased | Spindle Checkpoint | ||
HHTA1/Phos:2 | Increased | Protein localization to chromosome, centromeric region | SGOL1 |
H3T3ph | Increased | Protein binding | BIRC5 |
Increased | Protein localization to chromosome, centromeric region | AURKB, CDCA8, INCENP, BIRC5 |
Even though phosphorylation is often used as a mechanism to regulate protein localization, none of the phosphorylated proteins in this group was annotated to indicate increased or decreased localization to the kinetochore/centromere relative the unphosphorylated form. In fact, the unphosphorylated forms of several of these proteins – BUB1B, CDC20, and AURKB – have been shown to localize to kinetochores with similar affinity as the phosphorylated forms see PRO annotation for BUB1B/PhosRes-(PR:000035373), CDC20/PhosRes-(PR:000035369), and AURKB/PhosRes-(PR:000035661). Intriguingly, the kinases for CDC20/Phos:1 (kinase is BUB1), BUB1B/Phos:2 (kinase is PLK1), ATM/Phos:2 (kinase is AURKB), and HHTA1/Phos:2 (kinase is BUB1), are themselves kinetochore/centromere localized proteins (see Figure
While phosphorylation did not affect the ability of these proteins to localize to the kinetochore/centromere themselves, three phosphorylated protein forms (phospho-Ser-121-Histone H2A, phospho-Thr-3-Histone H3, and BUB1B/Phos:3) showed an increased ability to recruit other proteins to the kinetochore/centromere relative to their respective unphosphorylated forms. Phosphorylation of Histone H2A on Ser-121 (HHTA1/Phos:2) creates a binding site for SGOL1. Phosphorylation of Histone H3 on Thr-3 (H3T3ph) creates a binding site for BIRC5, which in turn recruits the rest of the CPC (AUKB, CDCA8, and INCENP). Finally, BUB1B/Phos:3 is required for the kinetochore recruitment of MAD1 and MAD2.
Phosphorylation of BUB1B (BUB1B/Phos:2, BUB1B/Phos:3, and BUB1B/Phos:4) and AURKB (AURKB/Phos:1) is important for the ability of these proteins to regulate microtubule/kinetochore attachments as the phosphorylated forms show increased participation in attachment of spindle microtubules to kinetochores, metaphase plate congression, and/or chromosome segregation. Formation of stable, bipolar microtubule-kinetochore attachments requires a balance of kinase and phosphatase activity. AURKB destabilizes incorrect attachments by phosphorylating kinetochore components such as NDC80; the phosphatase PP2A counterbalances AURKB activity by dephosphorylating NDC80, thereby stabilizing attachments (Zich and Hardwick,
Four proteins – Cdc20/Phos:1, AURKB/Phos:1, ATM/Phos:2, and BUB1B/Phos:3 – show an increased ability to mediate the spindle checkpoint relative to their unphosphorylated counterparts. CDC20/Phos:1 (phosphorylated by BUB1) shows decreased ubiquitin ligase activity relative to unphosphorylated CDC20, which presumably leads to its increased checkpoint activity. Thus, the spindle checkpoint acts through CDC20 in two independent ways to inhibit the APC/C: through formation of the MCC (BUB1B, BUB3, MAD2, and CDC20), which binds and inhibits the APC/C, and by phosphorylation of CDC20, which inhibits its ubiquitin ligase activity. Both AURKB/Phos:1 and ATM/Phos:2 have increased protein kinase activity relative to the unphosphorylated forms, which may be important in their increased ability to participate in the checkpoint response, although this possibility has not been directly tested. BUB1B/Phos:3 may participate in the checkpoint through its recruitment of MAD1L1 and MAD2L1 to kinetochores.
The structural framework and features of PRO enable the investigation of many aspects of proteins and complexes, particularly analyses of cross-species relationships and relationships between modified proteins forms and functions. Our spindle checkpoint use case outlines a number of strategies that can be generalized to other cellular processes or pathways of interest.
In this study we showed how the PRO framework could be used to investigate the role of different protein forms that participate in a biological process of interest. We focused on PTM protein forms, as PTM is a central mechanism for the regulation of protein function in cells. Most PTM resources specialize in a single type of modification (e.g., phosphorylation) and are organized around individual modification sites. However, protein modification
We used PRO to explore the role of protein phosphorylation in the context of the spindle checkpoint. Our examination of the PRO representation of human BUB1B phosphorylated forms and complexes revealed multiple phosphorylated forms of this protein and at least two participating kinases (Figure
A related biological question that can be addressed with PRO concerns the cross-species conservation of modified protein forms. Here we described a small scale study involving the phosphorylation of one protein – BUB1B – in three organisms – human, frog, and mouse. Based on the descriptions of BUB1B phosphorylated forms in PRO and a multiple sequence alignment, we concluded that all four BUB1B phosphorylated forms found in humans could be conserved in mice. Three of the four forms are either known to be conserved in frogs or are likely to be, but one form, BUB1B/Phos:4, is not.
Discovery that a modified form found in one species is not conserved in another species is very interesting because a comparison of the function of that protein in the two organisms can provide insight into the role of the modification. Prediction that a modified protein form is conserved in a species where it has not yet been characterized is also useful because it expands the pool of organisms that can be used to study the modified form. For example, confirmation of the existence of BUB1B phosphorylated forms in mice would allow the study of BUB1B forms in mammalian cells undergoing meiosis. These studies could shed light on a question about the function of BUB1B/Phos:1. Frog BUB1B/Phos:1 has been shown to be required for spindle checkpoint cell cycle arrest; in contrast, human BUB1B/Phos:1 is dispensable for cell cycle arrest under these circumstances (Elowe et al.,
Cross-species analysis of modified protein forms is not limited to a single protein. It can be expanded to include all modified proteins involved in a biological process or present in a particular cellular compartment. It is also not restricted to phosphorylated proteins. The PRO framework can be used to define many kinds of modified protein forms, including those that arise from post-translational modifications such as methylation, acetylation, and ubiquitination and protein isoforms that arise from alternative splicing or from protein cleavage.
Research into the mechanisms of a biological process often proceeds simultaneously in multiple model systems. In many cases, a clear picture of the process emerges only after data generated from disparate lines of experiment are considered as a whole. Merging of data in this way relies on the assumption that the proteins and pathways examined in the different systems are functionally related. The organization of PRO reflects evolutionary relationships among proteins and can be used as a guide in cross-species comparisons of experimental results. In PRO, organism-specific terms that share 1:1 orthology are grouped under a species-independent parent term (gene-level term) and species-independent terms that share a common domain structure are further grouped under a family level terms. In our analysis, we found that human and yeast BUB1 are 1:1 orthologs and thus share the same species-independent parent terms. However, human BUB1B lies on a separate branch of the PRO hierarchy from its closest yeast relative, MAD3 (Figure
Often, it is possible to gain insight into the function of proteins in a common pathway by examining their PPIs. PRO facilitates the construction of PPI networks for groups of proteins that are related by some common attribute. Using the built-in PRO search function, it is possible retrieve all PRO terms that share an attribute (e.g., kinetochore localization). The PAF for these terms, which contains PPI information in machine-readable format, can then be downloaded and used to build a PPI network with Cytoscape. Because PRO annotation can show interactions that are dependent on protein modification, PPI networks constructed with PRO have an added dimension that is absent from other PPI network building resources. For example, our PPI network of kinetochore-localized proteins shows that PP2A-B56-alpha interacts specifically with BUB1B/Phos:2 and that PLK1 fails to interact with the unphosphorylated form of BUB1 (Figure
As we have shown with this use case, PRO is a valuable tool for the study of a complex biological process. Interoperating with other ontologies and resources, PRO provides a structural framework that organizes current knowledge about protein forms, complexes, and cross-species relationships among proteins. While we focused on the spindle checkpoint, the PRO search, display, and analysis strategies we demonstrated here can be applied to any process. PRO-based analysis is particularly valuable for processes where modified protein forms play a prominent role. While PRO coverage is limited for modified forms, we rely on the user community to help in populating the ontology. The web-based RACE-PRO interface provides one means for the user to contribute to PRO. As PRO grows, it will become an increasingly useful resource that can provide insight into biological processes and stimulate the generation of experimentally testable hypotheses.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work has been supported by the National Science Foundation [ABI-1062520] and the National Institutes of Health [2R01GM080646-06].
1
2
3
4
5
6
7
8
9
10
11