Ancestry Inference
Ancestry inference and the related problem of haplotype inference are essential for many genetic applications such as genotype imputation and linkage mapping. However, traditional algorithms based on Hidden Markov Models suffer from exponential state spaces, making them infeasible for many realistic datasets. In particular, the complex pedigrees of model organisms present challenges for this class of algorithms. Here, we optimize the Lander–Green algorithm to make the analysis tractable for model organisms. The majority of our speed up is due to implicit modeling of individuals involved in inbreeding. The optimizations do not compromise the accuracy of the inference but scale much better with pedigree size.
The source code will be released shortly
For questions, contact asarkar [at] cs [dot] unc [dot] edu
Research Sponsor
- NSF IIS 0448392 “Mining Salient Localized Patterns in Complex Data”
- NSF IIS 0812464 “Discovering and Exploring Patterns in Subspaces”