Building a Character Matrix

Jen here – 

Interested in understanding how we take morphological data from extinct animals and use them to infer an evolutionary history? These trees can be used as a framework to test different macroevolutionary questions regarding species distribution, paleoecology, rates of change, and so much more! We hope to set the stage to explain how each step is done! First things first, constructing a character matrix. 

Before really diving into anything specific, I would suggest you think a little about evolution, phylogeny, and all the basic terminology that goes into this field. I would recommend that you work through The Compleat Cladist: A Primer of Phylogenetic Procedures. This is effectively a workbook that walks you through terms, concepts, and more!

This isn’t meant to be an exhaustive guide but rather set you up to explore the program and generate a test character matrix!

Step 1: Learn about your study group

This will involve a LOT of reading and diving into the history of the animals you are interested in. In some instances this is easy, in others it is very difficult! I won’t dwell on this too much but it’s easy to forget where to begin. I would start by using Google Scholar to research your group of interest plus evolution, morphology, phylogeny. Then you will probably have to head to the library armed with a list of literature that is much older than you to really begin your deep dive. Remember that ideas change through time, so starting at the beginning is really valuable to learn how ideas have changed!

What is important is that you also learn about homology and work to understand the homologous elements of your critters. Homology is simply similarity due to inheritance from a common ancestor. The understanding and evaluation of homology may be different depending on the group you are looking at. For example, echinoderms have been considered this way for a while now and there are several schemes. One takes into account the body as a whole and how the elements are connected, the other takes a more specific approach looking at specific plates around the mouth. These are not mutually exclusive schemes but can be used in concert with one another. Another good thing to remember is that some people like to think they are more correct than others – who’s to say, really. Just make sure you do your own homework to form your own opinions and ideas. 

Step 2: Organize your information

There are several ways to do this, you could simply store information in Excel or Google Sheets or you could use a program designed for curating character data. I have used Mesquite for this. Mesquite is freely available software that is 

“…modular, extendible software for evolutionary biology, designed to help biologists organize and analyze comparative data about organisms. Its emphasis is on phylogenetic analysis, but some of its modules concern population genetics, while others do non-phylogenetic multivariate analysis. Because it is modular, the analyses available depend on the modules installed.”

You can easily describe your characters, add new taxa, remove taxa, import or draw a tree and see how characters change across different tree topologies. 

Here is the barebones starting place. I set up a new file and said I wanted three taxa and three characters. Now I can go in and start editing things!


There is a side tool bar where you can easily start to modify the matrix. So you can change the taxon names, add taxa, change characters, add characters, delete whatever you want, and a lot more that I haven’t really messed around with! I suggest that if you are a first time user, you spend some time with your fake matrix messing around. Once you get a sizable dataset in here, it’s best you don’t make any mistakes! Figure out where you may go awry and troubleshoot ahead of time.

Here is my edited matrix where I’ve added in three taxa and three characters. Notice at the bottom where it shows a character and the different states that are available. So when you edit the matrix you can use numbers or the character state – numbers are easier!


An easier way to import your characters and the different states is to use the State Names Editor Window.  This shows you the list of your characters and all the different states it can have – you can easily edit these and it’s a nice way to organize the characters since in the character matrix the text is slanted and kind of hard to read.

Character matrix with the character list on the far left column and the states spanning the rest. The states can be whatever you want – which is where bias can slip in so don’t forget to refer back to your knowledge base and understanding of homology.


The functionality of Mesquite extends quite beyond this. If you are looking for tutorials or to push the limits of the program here is some further reading:

Step 3: Export your matrix for analysis

Extensive export options via Mesquite!

File > Export will give you a series of options to export your file, don’t forget to also regular SAVE your file so that you can revisit your matrix to easily add to it! Most programs that infer phylogenies require a NEXUS file. This type of file has your matrix and often a bit more information about what you want in the analysis or information about the characters. I would suggest using your favorite plain text editor and exporting a few different types so you can see how they are structured and why certain programs may want different files and different information!


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.