How Much Bias Exists in Fossil Research?

Phylogenetic Signal and Bias in Paleontology

By Robert J. Asher and Martin R. Smith

Summarized by: Alyssa Anderson, a senior geology major at the University of South Florida. Her dream is to work with environmental sciences and geology in water-related fields, such as oceanography or hydrology. In her spare time, she enjoys writing or sketching.

What data were used? Six sets of morphological data (i.e., data about the physical characteristics of organisms) from previously published fossil studies were gathered and the genetic data (i.e., the DNA) from living examples of those organisms were matched with the fossils. This was used to create rudimentary evolutionary trees of mammals and birds, the primary subjects of the studies. The trees retained all of the character codes from previous studies, even if there had been critiques.  

Methods: This study only used taxa that had at least 50% of the molecular and morphological characters present. Researchers tested how missing data would affect the outcomes of the phylogenetic trees. To do this, researched used artificial extinction and artificial fossilization techniques. This means, the researchers used various computer programming packages to artificially remove all molecular data from fossil taxa (as most fossils do not have any molecular data preserved) and some of the morphological data was also removed, as is also common with fossils. 

Results: The researchers worked to test three hypotheses. The first hypothesis studied the reconstruction of evolutionary trees based on the missing fossil data and molecular data. The results found that the experiment created fairly accurate evolutionary trees from this. The second hypothesis tested how accurate morphological studies are without molecular data in creating evolutionary trees. This was also found to uncover accurate results. Finally, the third hypothesis tested if poorly fossilized data leads to misinformed conclusions. Results demonstrated that including poor fossil data with missing information created better trees than trees that had no fossil data. In summary, any data helps make the trees more accurate, and it generally does not result in inaccurate evolutionary relationships. The most accurate evolutionary trees are made when molecular data and morphology data are combined.

A picture of six different evolutionary trees of bird and mammal genera, each from previously published data. The trees are well resolved; broader clades (e.g., marsupials) are highlighted in each of the trees
Figure: Six trees gathered from the five studies investigated in this research paper. The animal groups focused on here are birds and mammals. The tips of the tree are the genera used in this study.

Why is this study important? Determining how important including fossil data is, even in cases of the fossils’ inevitable missing data, was important in this study. An additional question that researchers wanted to know was if fossil breakdown creates situations where unrelated fossils appear more similar to each other than they actually are. Many morphological features can look similar across species, even if they are not closely related, so the process of fossil decay can make it even more difficult to piece out how similar or different certain features are. However, it doesn’t seem that this bias of fossil decay affects the dataset very much. Their placement on the evolutionary tree were usually quite similar to the known trees the researchers used. 

The big picture: Identifying bias or limitations in scientific studies is one of the most important things scientist can do. Bias can never be fully removed, and limitations won’t ever be either, but investigating sources of bias and the ranges of our limitations can help reduce it in future studies. Studying fossils is a vital science as it shows the history of the world and how species have evolved over time. If the information gained from the fossils is misleading in evolutionary analyses, due to the fossils inability to provide DNA or or do not retain clear features that mean it can’t be properly identified, then that could mess up our study of history and evolution. Through this study, it was discovered that using multiple sources of data (i.e., morphology and molecular data) create far more accurate evolutionary trees than trees that don’t use both. The study of paleontology and other sciences can benefit from this knowledge to improve other experiments in the future and broaden our understanding of the world.

Citation: Asher, J. R. and Smith, R. M. 2021. Phylogenetic Signal and Bias in Paleontology. Syst. Biol. 0(0):1–23. DOI:10.1093/sysbio/syab072