Chair's Message

The Genetics of Mice and Men

Technopreneurship Takes Off at Carolina

Alumni News

Department News

Family Matters

In Memoriam

Recent Publications

The Back Page

The Genetics of Mice and Men


Mice from the Collaborative Cross are bred in such a way as to represent the genetic diversity of humans, so that they can be used in studying human disease.


Professor Wei Wang’s background is in data mining, but her primary research focus today is computational biology, running the “CSBio” group of over 20 staff, postdocs and students, jointly with Associate Professor Leonard McMillan. Still, she views many of the problems faced by her colleagues in genetics, cell biology, pharmacy and biostatistics as data mining problems.

One project the CSBio group is working on is genome ancestry inference, or determining which part of a resulting DNA sequence is inherited from which founder DNA sequence. The research uses both real and simulated data from the Collaborative Cross, a mouse facility housed at UNC-Chapel Hill with hundreds of mouse lines that originated from eight different founders. The Collaborative Cross seeks to represent the genetic diversity of humans through controlled breeding of the mice strains, to allow for research about inheritance and diseases. For example, knowing which piece of the genome came from which ancestor might allow for determining blood pressure inheritance. However, due to the large amount of data involved, computations are expensive and time consuming, which is one problem faced by researchers.

To help solve this problem, one of Wang’s students, Eric Yi Liu, has found a way to efficiently infer genome ancestry, using a Hidden Markov Model that derives the ancestry probabilities without explicitly modeling every generation bred. His method accurately estimates a probability distribution of the founder origin of every base in each chromosome in a few seconds, meaning he can compute the entire ancestry map for each mouse in a matter of minutes. The method can also be used to spot mistakes made in the breeding assignment and/or genotyping process.

Another area of research deals with determining which genes influence disease. While there are a few diseases that are influenced by only one gene, many diseases are influenced by multiple genes, and figuring out which genes affect, for example, hypertension, is a formidable job, because of the sheer number of genes on the human genome.

To approach this problem, Wang’s student Xiang Zhang created an algorithm that can be used to quickly examine the effect of gene-gene interaction (called epistasis) when examining associations across the entire human genome. Where it would typically take a supercomputer seven-to-ten years to compute such interactions, Zhang’s algorithm can accomplish the task in a matter of hours, due to using convex optimization and efficient indexing to determine which epistases are possibly statistically significant and then computing only those that have the potential to be statistically significant.