Inferring Phylogenies

Jen here – 

Are you interested in understanding how we take morphological data from extinct animals and use them to infer an evolutionary history? We often think of and visualize relationships as trees, this includes your family tree. We have an entire page on Reading the Tree of Life so you can understand how to read and interpret these visualizations. These trees, called phylogenies, can be used as a framework to test different macroevolutionary questions regarding species distribution, paleoecology, rates of change, and so much more! We hope to set the stage to explain how each step is done! 

Before really diving into anything specific, I would suggest you think a little about evolution, phylogeny, and all the basic terminology that builds the foundation for understanding evolutionary theory. I would recommend that you work through The Compleat Cladist: A Primer of Phylogenetic Procedures. This is effectively a workbook that walks you through terms, concepts, and more!

This isn’t meant to be an exhaustive guide but rather set you up to explore the programs and infer a phylogeny! Now that you have learned all you can about your study organism and how to build a character matrix the next step is inferring a phylogeny. 

What does it mean to infer a phylogeny?

Simply, evolutionary scientists can take a data matrix and apply mathematical and statistical models to estimate, or infer, species relationships to generate a phylogeny (evolutionary history). In paleontology, the data are generated by an individual’s understanding of homologous characters in the group and are inherently biased to their expert knowledge. Homology is the similarity due to inheritance from a common ancestor. As such, the researcher is presenting a phylogenetic hypothesis for the group.

It is  important to understand the purpose for pursuing any scientific approach. Why paleontologists should pursue building and inferring phylogenies  is well described by Brian O’Meara in his PhyloMeth video on Why build phylogenies? In essence, tree topologies not only tell us about how organisms are related to one another but they can be used as a framework for a variety of macroevolutionary approaches. 

To get an idea of the basics of tree space, please watch this video, Be afraid of tree space, by Brian O’Meara to get you excited about trees.

What are the methods?

These are several of the primary methods currently being used in phylogenetic paleobiology. There are certainly more methods and we encourage you to explore and learn on your own!

Maximum Parsimony

Parsimony, similar to Occam’s razor, suggests that the simplest explanation that fits the evidence is the best. Applying this logic to evolutionary trees means that the best inference or hypothesis is the one that requires the fewest evolutionary changes – or character changes across branches. 

Software: PAUP*, TNT

More reading:

Maximum Likelihood

Likelihood methods provide probabilities of the data given a model of their evolution. The more probable the data given the tree, the more the tree is preferred overall. Because the model is chosen by the user, this method can be employed for a variety of situations. 

Models of evolution in paleobiology include: Jukes Cantor (JC), Felsenstein (F81) but there are many others. Here is an entire chapter on Selecting Models of Evolution by David Posada

Software: PAUP*, RAxML, see Bayesian software list, you can use those as well. 

More reading: 

Bayesian Estimation

Similar to Maximum Likelihood, Bayesian estimation is based on the probabilities of the data given a model of their evolution with the addition of prior beliefs.

Software: RevBayes, MrBayes, BEAST

More reading: 

How do you select a method?

Why not try them all? Paleontology has been slow to adapt the statistical models to better suit our character data and there are many mindsets that are stuck on ‘this is the best way’. However, until you attempt and try each method it is hard to say one is ‘better’ than the other. Some methods may provide a route that is more closely aligned with how your clade evolved through time. Maybe one is more flexible for your dataset, maybe you get the same answer with multiple methods, or maybe you realize something new about your dataset from running multiple scenarios. 

More reading on support and comparing methods: 

General resources and further reading: 

Subscribe to the PhyloMeth YouTube channel and watch pervious lectures and discussions and different aspects of phylogenetic methods.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.