What is it for?

High dimensional data is often hard to visualize because
of the sheer amount of information given. For example, SNPs and gene
microarray expressions often present researchers with volumes of data, with
upwards of 18-26 dimensions. The picture to the right is just a few columns
and rows from a spreadsheet of 800 SNPs, each with 26 dimensions. It’s a daunting
task, to sift through this data for meaningful inter-SNP relationships and

This project presents a visualization tool for high-dimensional data that
projects the data onto a lower-dimensional space, while trying to preserve
the dissimilarities between the original SNPs or genes.

Download the executable

Some sample SNP data

Data from the screenshot


To create an interactive pre-processing tool that will aid
scientists in selecting and observing clusters in high-dimensional data. The
tool will blend several different dissimilarity measures, allowing the user to
weight each dissimilarity and view the results in real-time. Users will also be
able to cluster the data and view the clusters as they evolve as users reweight
the dissimilarities.

Instructions at the bottom of the page


· Dissimilarity Measures
· Multidimensional Scaling (MDS)
· WxPython
· Trackball interface
· OpenGL


Main Interface:

· Data is projected onto 3D and displayed as points
· Trackball interface to view the data
· Zoom into and out of point cloud
· Interactive rates for MDS
· Be able to package the application as a standalone executable
· Secondary information is displayed as well

Dissimilarity Matrices:

· Display dissimilarity matrix as a texture next to each slider
· Sliders for each dissimilarity, weighting them from 0 to 1
· Provide option to normalize weights or use as is
· Histogram for each dissimilarity, to show distribution of values
· Colorbar for each dissimilarity
· Display the blended dissimilarity matrix that is fed to MDS
· Blended matrix updated each time weights change to show the change in contribution from each dissimilarity matrix
· Matrices are permuted to reflect reordering by clusters

Cluster Management:

· Be able to cluster the data using automatic clustering algorithms (e.g., kmeans or tree)
· User specifies the number of clusters
· User should also be able to self-select clusters
· User can add to or subtract from clusters
· User-selected clusters are more tightly bound than algorithm-selected clusters
· Clusters distinguished by color
· Clusters are persistent over reweightings
· Provide a colorbar to map from colored points to the point’s position on the matrix

Optional Features:

· Run MDS as separate thread
· Selected SNPs can be tracked, via point on plot as well as within disssimilarity matrix

Mouse and Keyboard Controls:

Mouse Controls
· LEFT button: rotate the point cloud
· RIGHT button: select points to cluster, while dragging mouse
· MIDDLE button: select an individual point. The point will show up in RED and LARGER, and if SNP-name is loaded, its information will be displayed in the status bar
Keyboard Controls
· CTRL key: hold it down while selecting clusters to append new selection to the last cluster selection
· SHIFT key: hold it down while selecting clusters to “free” points
from the last cluster selection
· ALT key: hold it down while selecting A SINGLE POINT to “steal” that cluster. This basically sets the cluster of that single point as “the current cluster” so if you append points afterwards, they will be added to the current cluster (since that was the last cluster selection)

Some quick instructions:

· Extract zip file to a separate folder
· Double-click on “MyGUI_D.exe” to run the viewer
· Save sample data to a separate folder
· Data is organized by phenotype, so all files prefixed with “p_[some number n]” belong to the n-th phenotype.
· Click on “Load Data” to load snp data. In the popup file chooser, select “p_n_snp.txt”
· Click on “Load Distance Function” to generate distance matrix for data
· To load positional data, click on “Load Data” again and select “p_n_pos.txt” file
· To load SNP name (so that you can select and view individual SNPs), click on “Load Secondary Textual Information” and select “p_n_name.txt” file
· Binary information can also be displayed, by clicking on “Load Secondary Information” and selecting “p_0_snp_bin_ordered.txt” This will be displayed on the lower right side of the screen.

Research Sponsor

NSF IIS 0534580: “Visualizing and Exploring High-dimensional Data”