BioComp: Identifying Spatial Motifs for Classification of Protein Structure and Function
Principal Investigator: Wei Wang
Funding Agency: The National Science Foundation
Agency Number: CCF-0523875
Abstract
A central tenet of molecular biology is that a protein's function is determined by its structure. The Protein Structure Initiative (http://www.nigms.nih.gov/psi/) and other recent efforts have targeted the accurate determination of protein structures for all genes encoded in genomes. The result has been a rapid increase in the number of proteins for which the 3D structure is known, which has enabled a new computational approach to the study of protein structure and function based upon recurring amino-acid packing patterns or spatial motifs in a collection of known protein structures. These spatial motifs may correlate with experimental measurements of protein function or with specific protein families. Our preliminary work supports the premise that such spatial motifs may be a more suitable starting point for protein function research than sequence level motifs. In this project, we propose to undertake a comprehensive analysis of protein structures. We will mine the protein structures available in the PDB (Protein Data Bank) for spatial motifs, and construct each protein's signature as a combination of such motifs. Similarity measures between the signatures can serve as the basis of various predictions of protein structural and functional classifications. We will look for family specific motifs (measured by enrichment significance) and significant associations between occurrences of spatial motifs. This project will integrate novel techniques to link recurring structural patterns in protein families with protein function. The proposed studies will have significant impact on modern structural biology. Recent studies, including those conducted by our group, indicate that protein spatial motifs can be used for effective protein annotation, comparison, and classification. It will advance the frontiers of biology research by providing a novel, high-throughput mechanism for discovering, evaluating and annotating functionally significant spatial motifs derived from protein classifications - making a fundamental impact on the way biologists propose and test hypotheses that relate protein structure and function. Broader Impacts of this research include interdisciplinary collaboration and training, a multitude of educational impacts, and outreach to underrepresented minorities in the sciences.

