snpBrowser

August 10th, 2009 Leave a comment Go to comments

snpBrowser is an application designed to analyze and visualize the immense SNP datasets that are currently available. It has the ability to perform QTL analysis and assist in designing experiments to verify QTLs through the selection of parental strains. Additionally, the data may be visualized through a number of different modes which will facilitate a greater understanding of genetic diversity amongst strains.

The specific SNP dataset currently included with the application is the 74-strain, 7.8 million SNP dataset (version 1.1), available from The Center for Genome Dynamics. Additionally, an Agilent gene annotations dataset for the mouse, based on NCBI Build 36, is also included. However, the application has the flexibility to load in other SNP datasets as well.

A screenshot of the application is shown below:

snpbrowser_application_thumb.png

Datasets

The specific SNP dataset currently included with the application is the 74-strain, 7.8 million SNP dataset (version 1.1), available from The Center for Genome Dynamics. Additionally, an Agilent gene annotations dataset for the mouse, based on NCBI Build 36, is also included. The files used to represent these datasets are unique to the application and have been optimized for size and speed. However, the application has the flexibility to load in other SNP datasets as well, once they have been appropriately processed.

SNP and gene datasets are usually available in their raw form as comma separated value (CSV) files. Unfortunately, there is little consistency as to the format of newly released files. Therefore, currently there is no automatic tool to transform the raw data into the data structure as expected by the application. New scripts would thus need to be written to parse new data. However, once the new data has been saved and compressed in the appropriate format, it can be easily loaded into the application with only a small change to a configuration file.

SNP Density Module

This module visualizes the SNPs as they are distributed across the 21 mouse chromosomes. By the definition of a SNP, SNP density represents the genetic diversity across all the strains represented in the dataset.

SNP Diversity Module

The “SNP Density” module portrays diversity across all the strains, while the “SNP Diversity” module depicts genetic diversity across a subset of the strains as chosen by the user. A given SNP will be “lost” if, at that SNP, the genotype is the same for all the strains chosen. Thus, this module displays the distribution of the SNPs that are not “lost”, but “retained”. The visualization generated by this module would be useful in understanding where in the genome the given group of strains are ancestrally related and where they are divergent.

 

SNP Partitioning Module

The “SNP Partitioning” module provides a quick though rough analysis of regions of correlation between phenotype and genotype; it can be envisioned as a starting point for association studies, to be used prior to employing the slower but more heavy-duty QTL analysis. The module asks the user to create two groups of strains. Assumedly the mice in one of the groups will exhibit extreme phenotypes on one end of the spectrum, while the mice in the other group will show extreme phenotypes on the other end of the spectrum. A SNP is considered to correspond to this partitioning if all the strains in one group share one allele and all the strains in the other group share the other allele. Thus, this can be seen as a rough version of point-mapping.

 

 

Gene Density Module

This module visualizes the location of genes as they are distributed across the mouse chromosomes. It also has information on the names of genes as well as short descriptions of their functions.

QTL Analysis Module

Quantitative trait loci (QTL) analysis attempts to find areas of high correlation between genotype and entered phenotypes. There are two modes of operation available to per-form such an analysis:

  • Point mapping mode
  • Haplotype association mapping (HAM) mode

The former correlates phenotypes with individual SNPs, while the latter correlates phenotypes with three contiguous SNPs or three contiguous strain distribution patterns (SDPs). Point mapping yields two haplotype groups, while HAM mapping with a window size of three yields a minimum of two to a maximum of eight haplotype groups. The significance of a resulting correlation score can be computed via permutation testing for a settable number of samples. The -log10(p-value) is reported.

 

F2 Analysis Selection Module

The “F2 Selection Analysis” module attempts to aid in the design of an F2 intercross by providing the data needed to select the best parental strains to be used in the cross. By performing pair-wise comparisons of all possible crosses, the module is looking to retain diversity at QTL SNPs (the two strains have different alleles at that locus) and reduce diversity elsewhere (the two strains share the same allele at that locus).



Go to snpBrowser Downloads

For questions/comments/suggestions, either comment below or e-mail Lynda Yang at lynda dot yang at unc dot edu.

Research Sponsor

NIH U01 CA105417: “Integrative Genetics of Cancer Susceptibility”
NSF IIS 0448392: “CAREER: Mining Salient Localized Patterns in Complex Data”

  1. No comments yet.
  1. No trackbacks yet.
You must be logged in to post a comment.