Projects

September 3rd, 2009 Leave a comment Go to comments

HiDimViewer

HiDimViewer is a visualization tool we are developing for high-dimensional datasets. It is designed to be used as an interactive data exploration tool to aid scientists in selecting and observing clusters in high-dimensional data.

NPUTE

NPUTE is an efficient data structure we have developed for finding pair-wise haplotype similarity. Its simplicity can lead to benefits in speed and exhaustive searches over multiple parameters.

Genetic Diversity of Mus musculus Laboratory Strains

The most commonly used resources harbor only a fraction of Mus musculus genetic diversity, which is not uniformly distributed resulting in many blind spots. Only resources that include wild-derived inbred strains from subspecies other than M. m. domesticus have no blind spots and uniform distribution of the variation. Unlike other resources that are primarily suited for gene discovery, the CC is the only resource that can support genome-wide network analysis, which is the foundation of systems genetics.

XBox Science

In XBox Science, we are exploring the potential of employing game interfaces, game-design principles, and game production approaches for constructing bioinformatics tools.

snpBrowser

SnpBrowser is an application designed to analyze and visualize the immense SNP datasets that are currently available. It provides modes for analyzing genetic diversity, marker segregation, strain selection, and QTL mapping.

Full-Genome SNP Compatibility

We are developing methods for partitioning a genome into blocks for which there are no apparent recombinations. Thus providing parsimonious sets of compatible genome intervals based on the four-gamete test. We have developed theory and methods for dividing a genome into compatible intervals and also developed the notion of an interval set that achieves an interval lower-bound, yet maximizes interval overlap.

Tree-based Genome-wide Association Mapping

In this project, we developed TreeQA, a quantitative genome wide association (GWA) mapping algorithm. TreeQA utilizes local perfect phylogenies constructed in genomic regions exhibiting no evidence of historical recombination. By efficient algorithm design and implementation, TreeQA can efficiently conduct quantitative genom-wide association analysis and is more effective than the previous methods.

FastMap

FastMap is a tool for genome wide association mapping that is designed for ‘Genetical Genomics’ studies using data from gene expression microarrays. It can accept both inbred mouse data, generally consisting of homozygous allele calls, and human SNP data, which includes heterozygous allele calls.

Strain Sequence Identity Interval Viewer

Strain Sequence Identity (SSI) Interval Viewer is a web application that allows the user to choose a subset of mice strains from the list. A newer version of this tool, based on a different data representation is now available at http://compgen.unc.edu/SIIntervals/.

Collaborative Cross Simulator

The Collaborative Cross Simulator will provide both data and visual simulations for the collaborative cross experiment. The simulator will provide a powerful tool for the community by allowing them to generate synthetic lines and populations. Using these synthetic mice, researchers can compare actual mouse data against statistically neutral and random data.

SNP Data Retrieval and Filtering

This online tool allows you to retrieve and filter genetic data sets. You can specify the format and fields in the output file, the strains and chromosomes you want included, and a number of special filters to apply to the genetic data before it is returned. An automatic query interface is also available in addition to the graphical user interface which allows you to send queries and retrieve data automatically within a separate program.

Genotype Sequence Segmentation

In this project, we study the problem of segmenting the genotype sequences into the minimum number of segments attributable to the founder sequences. Our algorithms incorporate biological constraints to greatly reduce the computation, and guarantee that only minimum segmentation solutions with comparable numbers of segments on both haplotypes of the genotype sequence are computed. Our algorithms can also work on noisy data including genotyping errors, point mutations, gene conversions, and missing values.

Inferring Genome-wide Mosaic Structure

In this project, we study the Minimum Mosaic Problem: given a set of genome sequences from individuals within a population, compute a mosaic structure containing the minimum number of breakpoints. This mosaic structure provides a good estimation of the minimum number of recombination events (and their location) required to generate the existing haplotypes in the population. We solve this problem by finding the shortest path in a directed graph. Our algorithm’s efficiency permits genome-wide analysis.

FastANOVA: an Efficient Algorithm for Genome-Wide Association Study

In this project, we studied the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones.

Gene Expression Extract: Tool for extraction of subsets from gene expression data

dendogram

This web tool allows one to extract subset of gene expression data by specifying subsets of genes,probes and strains. Clustering analysis can also be done on extracted data. An algorithm called, SAFE, is also integrated so that enrichment of biological pathways can be tested. The tool is available at http://compgen.unc.edu/GeneExprExtract

  1. No comments yet.
  1. No trackbacks yet.
You must be logged in to post a comment.