Collaborative Cross Tools

Tool suite developed to monitor the progress and status of the Collaborative Cross and provide a bunch of useful analysis tools over the in-progress and complete CC lines.

Mouse Phylogeny Viewer

Visualization and analysis tool for the subspecific origin and haplotype diversity among a set of over 100 classical laboratory mice.


HiDimViewer is a visualization tool we are developing for high-dimensional datasets. It is designed to be used as an interactive data exploration tool to aid scientists in selecting and observing clusters in high-dimensional data.



NPUTE is an efficient data structure we have developed for finding pair-wise haplotype similarity. Its simplicity can lead to benefits in speed and exhaustive searches over multiple parameters.



Genetic Diversity of Mus musculus Laboratory Strains

The most commonly used resources harbor only a fraction of Mus musculus genetic diversity, which is not uniformly distributed resulting in many blind spots. Only resources that include wild-derived inbred strains from subspecies other than M. m. domesticus have no blind spots and uniform distribution of the variation. Unlike other resources that are primarily suited for gene discovery, the CC is the only resource that can support genome-wide network analysis, which is the foundation of systems genetics.


XBox Science

In XBox Science, we are exploring the potential of employing game interfaces, game-design principles, and game production approaches for constructing bioinformatics tools.


Tree-based Genome-wide Association Mapping

In this project, we developed TreeQA, a quantitative genome wide association (GWA) mapping algorithm. TreeQA utilizes local perfect phylogenies constructed in genomic regions exhibiting no evidence of historical recombination. By efficient algorithm design and implementation, TreeQA can efficiently conduct quantitative genom-wide association analysis and is more effective than the previous methods.

Collaborative Cross Simulator

The Collaborative Cross Simulator will provide both data and visual simulations for the collaborative cross experiment. The simulator will provide a powerful tool for the community by allowing them to generate synthetic lines and populations. Using these synthetic mice, researchers can compare actual mouse data against statistically neutral and random data.


Genotype Sequence Segmentation

In this project, we study the problem of segmenting the genotype sequences into the minimum number of segments attributable to the founder sequences. Our algorithms incorporate biological constraints to greatly reduce the computation, and guarantee that only minimum segmentation solutions with comparable numbers of segments on both haplotypes of the genotype sequence are computed. Our algorithms can also work on noisy data including genotyping errors, point mutations, gene conversions, and missing values.


Inferring Genome-wide Mosaic Structure

In this project, we study the Minimum Mosaic Problem: given a set of genome sequences from individuals within a population, compute a mosaic structure containing the minimum number of breakpoints. This mosaic structure provides a good estimation of the minimum number of recombination events (and their location) required to generate the existing haplotypes in the population. We solve this problem by finding the shortest path in a directed graph. Our algorithm’s efficiency permits genome-wide analysis.


FastANOVA: an Efficient Algorithm for Genome-Wide Association Study

In this project, we studied the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones.


Gene Expression Extract: Tool for extraction of subsets from gene expression data


This web tool allows one to extract subset of gene expression data by specifying subsets of genes,probes and strains. Clustering analysis can also be done on extracted data. An algorithm called, SAFE, is also integrated so that enrichment of biological pathways can be tested. The tool is available at


GBrowse for Mouse Genome

GBrowse is an open source genome viewer, which combines both databases and interactive Web pages for manipulating and displaying annotations on genomes. We utilize GBrowse to visualize genome-wide datasets as tracks, of which the order and the appearance are customizable by administrators or end-users. GBrowse supports simultaneous overviews, regional views, and detailed views.