Inferring Genome-wide Mosaic Structure

Genetic recombination plays two essential biological roles. It ensures the fidelity of the transmission of genetic information from one generation to the next and it generates new combinations of genetic variants. Therefore, recombination is a critical process in shaping arrangement of polymorphisms within populations. “Recombination breakpoints” in a given set of genomes from individuals in a population divide the genome into haplotype blocks, resulting in a mosaic structure on the genome.

In this project, we are interested in inferring the possible mosaic structure of a given set of related haplotypes. This is accomplished by finding a set of recombination breakpoints that divide the haplotypes into compatible blocks according to the Four-Gamete Test (FGT)2. The FGT states that, under the infinite-sites assumption2, all pairs of polymorphisms should co-occur in only three out of their four possible configurations. Thus, when four configurations are observed in a pair of markers, it implies that either a recombination or a homoplastic event has occurred between them. We propose an efficient algorithm to solve the “Minimum Mosaic Problem”, which finds the mosaic with the minimum number of breakpoints. The algorithm is suitable for genome-wide study. [paper]

Please try this tool online.

Research Sponsor

NSF IIS 0448392: “CAREER: Mining Salient Localized Patterns in Complex Data”
NSF IIS 0534580: “Visualizing and Exploring High-dimensional Data”
NSF IIS 0812464: ” III-Core: Discovering and Exploring Patterns in Subspaces”