Projects
HiDimViewer

HiDimViewer is a visualization tool we are developing for high-dimensional datasets. It is designed to be used as an interactive data exploration tool to aid scientists in selecting and observing clusters in high-dimensional data.
NPUTE

NPUTE is an efficient data structure we have developed for finding pair-wise haplotype similarity. Its simplicity can lead to benefits in speed and exhaustive searches over multiple parameters.
Genetic Diversity of Mus musculus Laboratory Strains

The most commonly used resources harbor only a fraction of Mus musculus genetic diversity, which is not uniformly distributed resulting in many blind spots. Only resources that include wild-derived inbred strains from subspecies other than M. m. domesticus have no blind spots and uniform distribution of the variation. Unlike other resources that are primarily suited for gene discovery, the CC is the only resource that can support genome-wide network analysis, which is the foundation of systems genetics.
XBox Science

In XBox Science, we are exploring the potential of employing game interfaces, game-design principles, and game production approaches for constructing bioinformatics tools.
snpBrowser
![]()
SnpBrowser is an application designed to analyze and visualize the immense SNP datasets that are currently available. It provides modes for analyzing genetic diversity, marker segregation, strain selection, and QTL mapping.
Full-Genome SNP Compatibility

We are developing methods for partitioning a genome into blocks for which there are no apparent recombinations. Thus providing parsimonious sets of compatible genome intervals based on the four-gamete test. We have developed theory and methods for dividing a genome into compatible intervals and also developed the notion of an interval set that achieves an interval lower-bound, yet maximizes interval overlap.
Tree-based Genome-wide Association Mapping

In this project, we developed TreeQA, a quantitative genome wide association (GWA) mapping algorithm. TreeQA utilizes local perfect phylogenies constructed in genomic regions exhibiting no evidence of historical recombination. By efficient algorithm design and implementation, TreeQA can efficiently conduct quantitative genom-wide association analysis and is more effective than the previous methods.
FastMap
![]()
FastMap is a tool for genome wide association mapping that is designed for ‘Genetical Genomics’ studies using data from gene expression microarrays. It can accept both inbred mouse data, generally consisting of homozygous allele calls, and human SNP data, which includes heterozygous allele calls.
Collaborative Cross Simulator

The Collaborative Cross Simulator will provide both data and visual simulations for the collaborative cross experiment. The simulator will provide a powerful tool for the community by allowing them to generate synthetic lines and populations. Using these synthetic mice, researchers can compare actual mouse data against statistically neutral and random data.
SNP Data Retrieval and Filtering

This online tool allows you to retrieve and filter genetic data sets. You can specify the format and fields in the output file, the strains and chromosomes you want included, and a number of special filters to apply to the genetic data before it is returned. An automatic query interface is also available in addition to the graphical user interface which allows you to send queries and retrieve data automatically within a separate program.
Genotype Sequence Segmentation

In this project, we study the problem of segmenting the genotype sequences into the minimum number of segments attributable to the founder sequences. Our algorithms incorporate biological constraints to greatly reduce the computation, and guarantee that only minimum segmentation solutions with comparable numbers of segments on both haplotypes of the genotype sequence are computed. Our algorithms can also work on noisy data including genotyping errors, point mutations, gene conversions, and missing values.
Inferring Genome-wide Mosaic Structure

In this project, we study the Minimum Mosaic Problem: given a set of genome sequences from individuals within a population, compute a mosaic structure containing the minimum number of breakpoints. This mosaic structure provides a good estimation of the minimum number of recombination events (and their location) required to generate the existing haplotypes in the population. We solve this problem by finding the shortest path in a directed graph. Our algorithm’s efficiency permits genome-wide analysis.
FastANOVA: an Efficient Algorithm for Genome-Wide Association Study

In this project, we studied the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones.
Gene Expression Extract: Tool for extraction of subsets from gene expression data

This web tool allows one to extract subset of gene expression data by specifying subsets of genes,probes and strains. Clustering analysis can also be done on extracted data. An algorithm called, SAFE, is also integrated so that enrichment of biological pathways can be tested. The tool is available at http://compgen.unc.edu/GeneExprExtract
Ancestry Inference

Ancestry inference and the related problem of haplotype inference are essential for many genetic applications such as genotype imputation and linkage mapping. However, traditional algorithms based on Hidden Markov Models suffer from exponential state spaces, making them infeasible for many realistic datasets. In particular, the complex pedigrees of model organisms present challenges for this class of algorithms. Here, we optimize the Lander–Green algorithm to make the analysis tractable for model organisms. The majority of our speed up is due to implicit modeling of individuals involved in inbreeding. The optimizations do not compromise the accuracy of the inference but scale much better with pedigree size.
GBrowse for Mouse Genome
GBrowse is an open source genome viewer, which combines both databases and interactive Web pages for manipulating and displaying annotations on genomes. We utilize GBrowse to visualize genome-wide datasets as tracks, of which the order and the appearance are customizable by administrators or end-users. GBrowse supports simultaneous overviews, regional views, and detailed views.