Genotype Sequence Segmentation

Recombination plays an important role in shaping the genetic variations present in current-day populations. We consider populations evolved from a small number of founders, where each individual’s genomic sequence is composed of segments from the founders. We study the problem of segmenting the genotype sequences into the minimum number of segments attributable to the founder sequences. The minimum segmentation can be used for inferring the relationship among sequences to identify the genetic basis of traits, which is important for disease association studies.

In this project, we propose two dynamic programming algorithms to compute the minimum segmentations for genotype sequences. Our algorithms run in polynomial time and consider biological constraints of the genotype segmentation problem, i.e., the number of segments in both haplotypes are comparable. Moreover, our algorithms account for the potential noise sources in the data including point mutations, gene conversions, genotyping errors, and missing values. [paper]

Research Sponsor

NSF IIS 0448392: “CAREER: Mining Salient Localized Patterns in Complex Data”
NSF IIS 0812464: “III-Core: Discovering and Exploring Patterns in Subspaces”