Experimental phylogenetics: generation of a known phylogeny

In order to test the accuracy of various phylogenetic inference methods, an experimental phylogeny was constructed for bacteriophage T7 driven by a mutagen.

All the methods produced the correct branching with the known phylogeny, however, none of them generated the correct distance between taxa. The methods are not able to faithfully predict the branch length because various assumptions do not hold in this experimental phylogeny. UPGMA had the worst performance in predicting phylogenetic distance in this experiment. This distance-based method makes the assumption that there is a constant rate of evolution among lineages. The tested bacteriophage T7 clearly does not have a molecular clock underlying its evolution. Similarly, the other distance-based method – neighbor-joining, is not able to take into account the multiple substitution at a site and therefore generates statistic errors in branch lengths. Maximum parsimony method produced the best result as there is no explicit assumption involved except for the fact that homoplasies may affect the results.

Among all the methods tested, only maximum parsimony method enables us to make inference of the ancestral phenotypes because all the others are distance-based. Parsimony uses character state, treating each position of the sequence as independent character. The inference was 98.6% correct compared to the known phylogeny. Nonetheless, there are a few incorrect inferences made that cannot be detected without the true phylogeny available.

This study makes a rather important point that all the phylogenetic reconstructions should be interpreted with caution. Moreover, detecting the variation between the inferred phylogeny and the true one may shed light on the creation of more reliable phylogenetic reconstruction method.