A preprint of our paper "Almost-Delaunay Simplices : Robust Neighbor Relations for Imprecise Points"
to appear in SODA'04 is available:
PDF (533kB) | PS (1.18MB).
The new C++ implementation in CGAL, described in my CGAL workshop paper, is available from my Software Page.
The Kinemage visualization page has links to three-dimensional animations in the Kinemage format, illustrating hinge detection using AD tetrahedra in TrpRS, and AD assignment of secondary structure, and a comparison table between AD and DSSP assignments are available. (NEW!)
I put up a page about my subgraph mining research.(NEW!)
![]() |
![]() |
![]() |
![]() |
Motivation Delaunay tessellations and Voronoi diagrams capture proximity relationships among sets of points. When applied to points representing protein atoms or residue positions, they are used to compute molecular surfaces and protein volumes, to define cavities and pockets, to analyze and score packing interactions, and to find structural motifs. Since atom and residue coordinates are known imprecisely, we explore the effect of coordinate perturbation on Delaunay-based scoring and motif finding. We have developed two concepts to help us do this : almost-Delaunay tetrahedra and Delaunay probability.
Definitions
We define 4 points in 3D to be an almost-Delaunay tetrahedron with threshold ε, denoted AD(ε), if they can be made to satisfy the Delaunay empty sphere property when all points in the set are perturbed by at most ε. Almost-Delaunay simplices are defined in exactly the same way using n-tuples of points in arbitrary dimensional space. This is illustrated in the diagrams below, for almost-Delaunay triangles in 2D.
![]()
![]()
![]()
![]()
(a) A non-Delaunay triangle (vertices blue) with points within its circumcircle.
(b) Radial perturbation of triangle vertices and other points.
(c) The smallest perturbation can be found from a minimum-width annulus.Delaunay Probability An AD(ε) tetrahedron in some sense captures the results of a worst-case perturbation. The average case can be modeled by assuming that perturbations come from a radial probability density function (pdf) such as a Gaussian, and performing Monte Carlo integration over a large number of instances, of the probability of a perturbed tetrahedron occurring times the probability that its circumsphere is empty. This computation takes a long time for all O(n^4) possible tetrahedra, but we can restrict it to those with a low AD threshold, since AD threshold is inversely correlated to Delaunay probability as shown below.
![]()
![]()
Tetrahedron probability falling off with AD threshold for a protein with perturbations from a Gaussian with average radius a=0.1 and a=0.5 Angstrom.
Implementation The 3D coordinates most often used to represent a protein are of the C-α atoms or the side-chain centroids. Our implementation in MATLAB computes the indices of Almost-Delaunay edges, triangles or tetrahedra along with their corresponding minimum thresholds. We set an edge-length prune to restrict the longest edge of an AD tetrahedron, typically to 10 Angstrom. Since we are only interested in specific ranges of perturbation, we keep only tetrahedra with small thresholds by setting an ε cutoff, typically 2.0 Angstrom.
The entire set of MATLAB source files is now available, along with the new C++ implementation in CGAL, and all the documentation, on my new Software Page. Sample pdb and DSSP files used in the experiments are here.
Results
- Robustness of the Delaunay Tessellation in proteins: We show below how the number of AD tetrahedra seems to decrease as we go from random points to chains to decoys to proteins with similar sequence length and packing density. Also, plotting the number of tetrahedra added at low thresholds in (e) indicates that the DT of proteins changes less than that of random points for a given perturbation.
- Stability of SNAPP scores: We developed variants of the SNAPP scores using almost-Delaunay tetrahedra with positive threshold (AD-SNAPP), Delaunay and AD tetrahedra (D+AD-SNAPP) and using Delaunay probabilities to weight each tetrahedron's score (Delaunay-probability SNAPP). The average scores turn out to be well correlated and equally good as averaged SNAPP scores at distinguishing decoys. The total SNAPP scores are better at making this distinction, and Delaunay-probability SNAPP scores are equally good as SNAPP scores, even better in a few cases in increasing the distinction of the native state from decoys with scores close to it.
![]()
![]()
![]()
![]()
Comparison of average SNAPP scores using Delaunay and almost-Delaunay tetrahedra (top row), and total SNAPP scores against Delaunay-probability SNAPP scores (bottom row), for the protein 2CRO's native state (extreme left) and 10% of its decoys from the 4state_reduced set from Decoys 'R Us, all represented by sidechain-centroids. The graphs in the left column are regular unweighted scores, and in the right column are weighted by tetrahedron type (see this paper for an explanation). - Detection of structural motifs: We are able to use the extra information available from AD thresholds to improve the discrimination of secondary structural elements (α-helices and β-sheets) over previous methods based on the Delaunay tessellation.
![]()
The histogram distribution of AD thresholds in a synthetic α-helix is striking; apart from the Delaunay tetrahedra that follow the patterns i+(1,2,3,4) and i+(1,2,4,5) as known in the literature, there are three characteristic α-helical peaks at ε=0.3, 0.7 and 1.2. The table at left shows the patterns and their corresponding threshold values. These patterns and peaks are seen in the histograms of all α-helical proteins represented by C-αs, and can be used for quantitative estimation of α-helix content as well as to estimate packing (by counting non-pattern tetrahedra). Decoys of α-helical proteins show sharp peaks, but they can be distinguished by the relative lack of non-pattern tetrahedra. Proteins represented by sidechain centroids do not show peaks, since the centroids don't have the regular structure seen in backbone coordinates. Below: Top row: synthetic α-helix, an α-helical protein (PXR) and a β-sheet protein (chymotrypsin). Bottom row: C-αs, sidechain centroids and a decoy of 2cro.
![]()
![]()
![]()
![]()
![]()
![]()
Patterns based on sequence interval (and in fact all the patterns that we have tried) do not show a clear signature of values of the AD threshold, in proteins with β-sheets. The pattern we currently use for detecting β-sheets is that almost-Delaunay tetrahedra that straddle two neighboring β-strands tend to have similar values of the maximum gap in sequence (for parallel strands) or values distributed in an interval (for antiparallel strands). Thus we look for peaks and plateaus in the histogram of the maximum sequence gap, and among the tetrahedra that fall in these areas, we determine the neighbors in adjacent β-strands by a mutual maximum frequency of occurrence search, and cluster parallel and antiparallel sequences of residues to complete the β-sheet. ![]()
![]()
Left: Here we compare the results of α-helix, β-sheet and β-turn determination using our methods with those from DSSP and PROMOTIF for a set of proteins from different structural families in CATH. The individual α-helices are accurate to +/- 2 residues at each end, and the total to within 10%. The β-sheet prediction is less accurate than α-helix, since our method currently does not use the value of the AD threshold as a part of the pattern, so the results are not always specific to the geometry of the β-sheet. See the tabulated comparison between AD and DSSP secondary structure assignments, and kinemage visualizations of AD, DSSP and a visual-geometric assignment.
Project Members
Deepak Bandyopadhyay
Graduate Student
UNC Chapel Hill
debug at cs.unc.eduAlexander Tropsha
Professor and Director, Laboratory of Molecular
Modeling, School of Pharmacy, UNC Chapel Hill
alex_tropsha at unc.eduJack Snoeyink
Professor of Computer Science
UNC Chapel Hill
snoeyink at cs.unc.edu