III-Core: Discovering and Exploring Patterns in Subspaces

IIS0812464 (September 1, 2008 ~ August 31, 2011)

High-throughput experimental methods have revolutionized scientific inquiry. In contrast to the hypothesis-driven scientific method, data-driven science seeks to discover and explore hypotheses supported by the huge volume of data generated in high-throughput experiments. Such datasets are large and high-dimensional: they consist of a multitude of samples and many measured attributes for each sample. A typical hypothesis corresponds to a subspace of this dataset: a subset of samples that share similar values on a subset of attributes.

The goal of this project is to develop a series of new data mining methods that can effectively discover these subspaces, the embedded patterns among the values, and the relationships between patterns. The underlying problems are highly combinatorial and efficient algorithms are required to enable users to mine and explore subspace patterns in large and complex datasets. The proposed methods combine the advantages of efficient matrix decomposition, effective sampling techniques, and advanced graph algorithms. Solutions to these research problems will be integrated into an interactive and visual interface to explore subspace patterns mined from experimental data. While the proposed methods are applicable across a wide range of domains, the focus of project is the analysis of gene regulatory networks and the analysis of protein structure, in collaboration respectively with geneticists and pharmacologists.


Principal Investigators: