## From Protein Folding to Efficient Algorithms

#### Origami models of carbon atoms from a module by Y. Momotani.

Professor Jack Snoeyink and his students are applying their knowledge of computational geometry to help solve real world problems. Computational geometry is a branch of computer science that studies geometric algorithms and their applications. It may sound abstract, but computational geometry forms a fundamental basis for many other areas of computing, including graphics, robotics, gaming, and computer-aided design.

Some of the Computational Geometry Group’s research is based on collaborations with scientists in other fields, such as biology and biochemistry. The central dogma of biology is that the sequences of DNA in genes code for the sequence of amino acids in proteins, which in turn fold into three dimensional structures that perform many of the functions of living organisms – from transporting and converting oxygen and energy in the blood and in muscle fibers, to the structure of hair and cell membranes. Molecules are in constant motion, so there isn’t a single “structure,” but, instead, probabilities of finding electrons in certain places or of finding certain lengths and angles for bonds between atoms.

#### Models of the ribosome (a protein construction factory) and a virus capsid from 3-D printing by A. Olson, Scripps.

In his work on the structure of protein molecules, Snoeyink’s graduate student advisee Matthew O’Meara studies geometric properties of inter-atomic bonds as observed from native molecular structures, and uses these observations to help build energy functions to solve the structure prediction problem. O’Meara’s tools let him visualize the distributions of parameters, including bond lengths and angles, from experimentally determined “native” structures and from predicted “decoy” structures, and tune the energy functions used to make the predictions so that the distributions become closer together.

O’Meara has taken a leadership role in the Rosetta Commons, a consortium of 23 research labs around the world, headquartered at the University of Washington Seattle. Recently, Rosetta has been used to computationally design a protein that manipulates the motion of cells with light (Wu, Nature 2009) and design a protein that binds a common region of variants of the flu Hemagglutinin molecule (Fleishman, Science 2011). It also is the engine of the popular Fold.It protein folding game (Cooper, Nature 2010).

The focus of O’Meara’s thesis research is tuning the computational model for hydrogen bonding in Rosetta. To do this, he is using a survey of 8000 experimentally validated protein structures assembled by collaborators in the Richardson Lab at Duke University. He has developed a framework to test how well the geometric features of Rosetta predictions match the geometric features in the experimental structures. He then optimizes the hydrogen bond parameters to increase the overall likelihood that Rosetta predictions match the experimental structures.

Other research being done by the Computational Geometry Group is helping to advance the field of computer science itself. Dave Millman, another of Snoeyink’s graduate student advisees, is exploring a new approach to design efficient algorithms for geometric problems that are guaranteed to give correct answers. Suppose that you want a map of the nearest fire station to every house in an area. In middle school geometry, you might construct a perpendicular bisector between two stations -- houses on one side are closer to one station, and houses on the other side are closer to the other station. In a computer, house positions are stored as coordinates, and the computer does numerical calculations to perform this geometric test. The computer, by default, approximates the numbers with a fixed number of digits, and so can get geometric tests slightly wrong – e.g. some house that is almost equidistant from two stations may be assigned to the wrong one. Not a major problem for this house, but a small numerical error may compound to large errors if the algorithm is founded on geometrical properties like, “the region closest to fire station A is connected and convex,” that can cause this to be the only house assigned to station A, leaving all houses that are in fact closest to A with no assigned fire station.

Computer Scientists traditionally develop programs for a model of a computer that can run for an arbitrarily long time, storing arbitrarily many data items in memory that has arbitrarily many bits of precision per location. Algorithm designers seek to minimize use of time and memory space, while hardware designers seek to expand those two resources. Precision, however, is often ignored by algorithm designers, even though it is rarely increased by hardware designers. Millman’s research seeks algorithms that minimize precision, as well as time and space.

One of the recurring themes in Snoeyink’s research is clearly stating the problem that you are solving, and clearly demonstrating that you have solved it as stated. This is an important skill for computer scientists at all levels, even though it is emphasized most in mathematical courses. When some of the undergraduate students choosing to pursue the new Bachelor of Arts in computer science told Snoeyink that they were trying to avoid the additional math required by the Bachelor of Science, he volunteered to teach “Discrete Structures,” as a more CS-oriented version of MATH 381, “Discrete Mathematics.” Both are about “proof” but where MATH 381 focuses on number theory, in “Discrete Structures” Snoeyink draws on material he developed in teaching to as many as 180 at the University of British Columbia how logical thinking and proof appears in computer science from automata to video games.