The new algorithm processes SNPs one by one and constructs a phylogeny tree. Here is an example to illustrate the basic idea of the algorithm.
Example data: (4 strains, a,b,c and d. Each row is a strain, each column is a SNP)
a: 1 0 0 1 1
b: 0 0 1 0 1
c: 1 1 1 0 1
d: 1 0 1 1 0
The steps to construct the tree

In each step, the algorithm splits a node in the tree according to the new SNP. Here we assume that the new SNP is compatible with all the previous SNPs. The algorithm is able to find out whether the new SNP is compatible or not during the construction.
Some key procedures in the algorithm
(1) given a new SNP, find the node in the tree to be splitted.
Given two SNP, s4, s5
s4 s5
1 1
0 1
0 1
1 0
The strains that have ‘1′ on s4 are splitted by s5, because they have differnt allels on s5. While the strains that have ‘0′ on s4 are not splitted by s5, becuase they still have the same allel on s5. We say that s5 split s4 on ‘1′. According to the 4 gamete rule, two unique SNPs are compatible if both of them split either ‘0′ or ‘1′ of the other SNP.
Then given a new SNP si, for each previous SNP sj, j from 1 to i-1, we can find how sj is splitted by si. For example, SNP s4 splits s1 on ‘1′, s2 on ‘0′ and s3 on ‘1′. We get a strain ‘101′, if we concatenate these numbers. This strain identifies an unique node in the tree which is exactly the node that need to be splitted in the current step. For example, ‘101′ identifies node ‘d’ in the current tree and it needs to be splitted in step 4. Note that, since only SNPs s1,s2 and s3 have been processed in the previous steps, node ‘d’ currently only represents the first 3 allels of strain d which is ‘101′.
There may be other ways to find this node. For example, the nodes in the tree can be divided into two groups according to their allels on the current SNP position. And the splitting node can be found accordingly. However, this method seems to be more complicated and not consistent.
(2) after the algorithm finds the node to be splitted, two new nodes are constructed. The strains represented by these two nodes have the same allels on all the previous SNPs and have the opposite allels on the current SNP. All the neighbors of the splitted nodes are re-linked to the two new nodes according to their allels on the current SNP, i.e., if it is 0, the neighbor node is linked to the new node having 0 on the current SNP, vice versa.
(3) for those additional nodes who represent the strains which are not in the given data, the allels on all the processed SNP positoins need to be decided according to the tree structure.¼/p>