Jen here –
Are you interested in understanding how we take morphological data from extinct animals and use them to infer an evolutionary history? We often think of and visualize relationships as trees, this includes your family tree. We have an entire page on Reading the Tree of Life so you can understand how to read and interpret these visualizations. These trees, called phylogenies, can be used as a framework to test different macroevolutionary questions regarding species distribution, paleoecology, rates of change, and so much more! We hope to set the stage to explain how each step is done!
Before really diving into anything specific, I would suggest you think a little about evolution, phylogeny, and all the basic terminology that builds the foundation for understanding evolutionary theory. I would recommend that you work through The Compleat Cladist: A Primer of Phylogenetic Procedures. This is effectively a workbook that walks you through terms, concepts, and more!
This isn’t meant to be an exhaustive guide but rather set you up to explore the programs and infer a phylogeny! Now that you have learned all you can about your study organism and how to build a character matrix the next step is inferring a phylogeny.
What does it mean to infer a phylogeny?
Simply, evolutionary scientists can take a data matrix and apply mathematical and statistical models to estimate, or infer, species relationships to generate a phylogeny (evolutionary history). In paleontology, the data are generated by an individual’s understanding of homologous characters in the group and are inherently biased to their expert knowledge. Homology is the similarity due to inheritance from a common ancestor. As such, the researcher is presenting a phylogenetic hypothesis for the group.
It is important to understand the purpose for pursuing any scientific approach. Why paleontologists should pursue building and inferring phylogenies is well described by Brian O’Meara in his PhyloMeth video on Why build phylogenies? In essence, tree topologies not only tell us about how organisms are related to one another but they can be used as a framework for a variety of macroevolutionary approaches.
To get an idea of the basics of tree space, please watch this video, Be afraid of tree space, by Brian O’Meara to get you excited about trees.
What are the methods?
These are several of the primary methods currently being used in phylogenetic paleobiology. There are certainly more methods and we encourage you to explore and learn on your own!
Maximum Parsimony
Parsimony, similar to Occam’s razor, suggests that the simplest explanation that fits the evidence is the best. Applying this logic to evolutionary trees means that the best inference or hypothesis is the one that requires the fewest evolutionary changes – or character changes across branches.
More reading:
- Reconstructing trees: Parsimony, UCMP
- Building trees using parsimony, UCMP
- Phylogenetic Analysis with PAUP, Brian O’Meara’s website
Maximum Likelihood
Likelihood methods provide probabilities of the data given a model of their evolution. The more probable the data given the tree, the more the tree is preferred overall. Because the model is chosen by the user, this method can be employed for a variety of situations.
Models of evolution in paleobiology include: Jukes Cantor (JC), Felsenstein (F81) but there are many others. Here is an entire chapter on Selecting Models of Evolution by David Posada
Software: PAUP*, RAxML, see Bayesian software list, you can use those as well.
More reading:
- Maximum Likelihood, NCBI
- Principles of Phylogenetics, UC Berkeley course
- Contrasting likelihood and Bayesian approaches, in general by Brian O’Meara, PhyloMeth
Bayesian Estimation
Similar to Maximum Likelihood, Bayesian estimation is based on the probabilities of the data given a model of their evolution with the addition of prior beliefs.
Software: RevBayes, MrBayes, BEAST
More reading:
- (OA) An Introduction to Bayesian Inference of Phylogeny [a simple example] by Huelsenbeck et al.
- (OA) RevBayes Tutorials
How do you select a method?
Why not try them all? Paleontology has been slow to adapt the statistical models to better suit our character data and there are many mindsets that are stuck on ‘this is the best way’. However, until you attempt and try each method it is hard to say one is ‘better’ than the other. Some methods may provide a route that is more closely aligned with how your clade evolved through time. Maybe one is more flexible for your dataset, maybe you get the same answer with multiple methods, or maybe you realize something new about your dataset from running multiple scenarios.
More reading on support and comparing methods:
- (OA) Bayesian Analysis Using a Simple Likelihood Model Outperforms Parsimony for Estimation of Phylogeny from Discrete Morphological Data
- (OA) Bayesian Phylogenetics lecture by Bret Larget
- (OA) Frequentist Properties of Bayesian Posterior Probabilities of Phylogenetic Trees Under Simple and Complex Substitution Models
- (OA) How Meaningful Are Bayesian Support Values?
General resources and further reading:
- (OA, online) Phylogenetic Methods: building and using trees to answer compelling questions, a course by Dr. Brian O’Meara.
- (OA, online) Analytical Paleobiology Workshop 2018, Inferring phylogenies with Dr. April Wright.
- (Book) Inferring Phylogenies, By Joseph Felsenstein
- (Book) Phylogenetics: Theory and Practice of Phylogenetic Systematics, 2nd Edition, by E. O. Wiley, Bruce S. Lieberman
- (OA, list) Phylogeny Programs by Dr. Joe Felsenstein
- Join phyloseminar and check out their YouTube
Subscribe to the PhyloMeth YouTube channel and watch pervious lectures and discussions and different aspects of phylogenetic methods.