The Strange Case of the Crocodile’s Snout. Part 1: Answerless Questions

Adrian Currie writes...

Zen koans are often questions without answers, designed to open the student's mind through contemplating paradoxes. Do scientists ask answerless questions? Well, not literally - scientists rarely enquire after the paradoxical. But, I'll suggest, they sometimes ask questions which they are unlikely to be able to answer. That is, given the knowledge, techniques and resources at our command, we have little reason to think we would get it right, or even know we have the right answer if we were to stumble upon it.

Scientists aren’t supposed to waste their time tilting at windmills, chasing rainbows, or making valiant attempts at the impossible. After all, there are only so many scientists, and only so much research funding. On the face of it, what research gets done should be decided by: (1) how likely it is to succeed, and (2) how important success would be. So, if we find scientists doggedly and determinedly trying to work something out, but have good reason to think they won't succeed, either something has gone very wrong, or we’ve made a mistake about what the value of the research is.

In my next two posts, I’m going to suggest that what’s valuable about scientific research is often not obvious—and so we sometimes risk dismissing apparently hopeless research too quickly. Asking questions we don’t expect to find answers to, I’ll argue, often has surprising, indirect, benefits. To show this, I need to first give an example of such an answerless question, and second demonstrate how asking it can be unexpectedly fruitful. Today, I’m going to attempt the first task. Specifically, I'll argue that many questions about the fine-grained ancestral relations between extinct organisms are question that we shouldn’t expect to answer. I’ll then leave you in suspense until my next post in January, where I’ll argue for the second part: that we should ask them anyway.

So, to start, how do we work out the relationships between extinct animals? Typically, using morphological phylogenetics. Ok, what is this ‘morphological phylogenetics’ business? I’m going to discuss it in a pretty relaxed, relatively—relatively—unjargony way. I welcome outraged paleontological corrections in the comments, particularly if my simplifications (and, well, my outright misunderstandings!) undermine the argument.

Paleobiologists use morphological phylogenetics to work out the ancestral relationships between extinct lineages*. A ‘phylogeny’ is a hypothesis about how various taxa are related to each other. You would have seen them before, here’s Darwin’s first one (from around 1837):

‘Morphology’ is roughly what organisms look like, and is the basic data used in constructing phylogenies in paleontology. Biologists interested in living taxa (‘neontologists’) often use molecular methods: the basic data is genetic.

Neontologists are sometimes rather leery of paleobiological methods to uncover the relationships between extinct lineages, perhaps because of the epistemic woes I’ll list below. I think this is rather short-sighted of them. Paleobiologists often ask large scale questions about the shape of life. How does speciation occur? How about adaptive radiations? Or mass extinctions? Why, at the largest scale, do lineages behave and evolve as they do? Answering (actually, just being able to ask) such questions requires a large-scale perspective on life’s history. Although neontologists are able to draw rich information from the living critters they study, from a paleobiological perspective this is insufficient. First, today's living world is a tiny fragment—a pathetic timeslice—of the grand history of life on earth. Second, that time-slice is in many ways atypical—it is biased. Consider, for instance, the fact that in the last thousands of years almost all of the world’s megafauna has disappeared. If you weigh over 30 kilograms, and you’re not from Africa, chances are you’re extinct. Looking at today's megafauna, then, would give the wrong idea about the number and diversity of large beasts that usually roam the Earth. So, although neontologists can use molecular methods to tell us a whole heap about the ancestry of extant animals, from a macroevolutionary perspective, this is extremely limited (indeed, there are big questions about the extent to which neontological and paleontological phylogenies can be combined, which Leonard has touched upon).

To tackle macroevolution, then, we want to know the ancestral history of extinct critters. But can we? Let’s delve a little deeper into phylogenetics.

Evolutionary theory tells us that some morphological features carry signals of ancestry. My dad and I both have red-tinged beards, and this tells us something about our relatedness. A character is, in this context, an organismic trait which carries such signals. Consider these skulls:

Notice that one has an elongated, thin snout—it is ‘longirostrine’—while the other has a squatter, heavier snout. We might think that, as with my father’s and my reddish hair, these features of snouts might make for good characters. If so, we might think that the following two skulls are from closely related critters:

These are two mesozoic crocodyliforms. If we compare these with the short-snouted crocodile above, we might ask whether their elongated snouts, and her shorter one, give us reason to think that they are more closely related to one another or not.

In principle, then I should be able to infer how related a group of critters are by understanding the evolutionary relationships between their morphological traits, their ‘characters’. Below is Darren Nash’s rather wonderful image of the crocodyliforms. Have a glance at them. How might we determine which of their morphological features might make for good characters?


So, a phylogenetic analysis involves first carving up our critters into characters—snout-length, for instance—and then setting these into character states—short or elongated. In constructing a tree, then, I can code the characters according to their states. I could code snort-snout as 0 and long-snout as 1, say.

We then set the polarity of the characters. This involves estimating what the base of the tree—the basal traits—were like. For instance, we might decide that having an elongated snout is basal, meaning that short snouts are a later—a derived—character in the tree. So, to set polarity I need to know what the common ancestor of the critters in question was like. This is done by ‘rooting’ the tree, specifically, by picking an outgroup. The ways in which the group we care about differs from the outgroup allows us to set character polarity.

With our characters carved, and our polarity set, we then apply various statistical algorithms which produce a bunch of trees.

Some of these algorithms involve cladistic parsimony. Roughly, they attempt to minimise the number of evolutionary events in a tree, where an ‘evolutionary event’ is understood as a change in character state. On this view, a tree which involves two crocodilians separately evolving long snouts from short snouts (two events) is less good than one which involves their common ancestor evolving long snouts, and them inheriting it (one event). In a typical analysis, many different trees are produced based on different polarities and different algorithms, and paleobiologists hunt for common themes—robust results—across them.

In phylogenetic analyses, between 40 to 200 characters are used—so, if indeed we want snout-length to be a character (more on this below!), it will be one among many. Under the right circumstances, where we’ve got a rich character-set, good ways of setting polarity, and so forth, morphological phylogenetic analyses can produce trees which are likely to converge upon the real history of life.

Ok, time for a bit of pessimism about the practice – specifically, I’ll argue that we shouldn’t expect to discover the actual crocodilian tree. It is potentially an answerless question.

In a recent paper, Derek Turner discusses this kind of move (indeed, he's made them himself). He argues that scientists need to bet on the future success of their investigations when they are deciding what research to pursue. He distinguishes between ‘current methods’ and ‘methods-neutral’ bets. The former decides on the future success or otherwise of an investigation on the basis of the technology and background theories currently at our disposal; the latter—much more speculative—kind of bet is against us ever succeeding. Obviously methods-neutral bets are a big ask. Scientists are forever inventing new spanners to loosen nature’s mysteries, and history tells us that predicting the shape, utility and form of said spanners is a fool’s game. For methods-neutral bets, then, science's future development will throw *you know whats* in the works.

So, I’m going to make a tentative current-methods-bet against some cases of morphological phylogenetics in paleobiology. My suggestion is simply that given our current techniques, it’s unlikely we’ll get the kind of robust convergences between analyses which should lead us to think we’re getting at the real history of the lineages.

The argument will be a laundry list. It’s worth noting that some of the items also apply to the molecular phylogenetics favoured by neontologists. I’ll work under the assumption that these problems are exacerbated by the nature of paleobiological evidence: they are often incomplete, degraded and ambiguous. Molecular phylogeneticists often think that the sheer amount of data they can generate will wash out the problems I mention here—I’m sceptical of this, but am happy to accept it for today’s purposes.

Here’s my list of epistemic woes affecting morphological phylogenies. These 5 woes, it seems to me, should lead us to be pretty pessimistic about our capacity to get true, fine grained, paleontological phylogenies.  

Woe the First: statistical techniques. To organize our character data and present a phylogenetic tree, we need to apply algorithms. But which? Above, I mentioned the notion of ‘cladistic parsimony’, the thought that we should minimize evolution events. There are a set of phylogenetic algorithms which attempt to encode that idea. But, there are other algorithms we might apply instead, which presumably in turn encode other ideas about how ancestry works. Even if we’re using a set of different algorithms, it’s unclear how we might decide which combination is the best (and indeed, arguments about these issues are often unfathomably confusing for the uninitiated like myself).

Woe the Second: undermined assumptions. In trying to decide which kinds of trees will best reflect the history of life, we need to make some assumptions. Worryingly, these have often turned out to be wrong. Maureen O’Malley, for instance, has discussed how assumptions about life progressing from simpler to more complex forms was undermined in light of molecular phylogenetic studies probing the history of molecules. It turns out that in some cases complex forms are basal, and simpler ones are derived. If surprises like that turn out to be common, we should be quite worried about the resilience of these kinds of assumptions.

Woe the Third: gaps. The fossil record is highly incomplete, both in terms of specimens, and complete specimens. So, there’s a lot of stuff missing. This adds a whole lot more uncertainty as various inferences are required just to work out whether the critter had the characters in question in the first place.

Woe the Fourth: character delineation & weighting. We must decide which morphological features to count as ‘characters’, and how seriously we should take them in our analysis. Should we take snout-length as a signal of ancestry or not? If so, should we consider it an important signal? Such questions are not obvious, because biological similarity can be generated in different ways. Two cases of longirostrine snouts could be homologous, that is, due to being inherited like my hair color. If so, they signal ancestry. But they could be homoplastic—they could have converged from different evolutionary states. This is considered a count against using snout length as a character in reconstructing crocodyliforms, as Eric Wilburg has put it:

“… because crocodyliforms have demonstrably evolved similar skull shapes several times, it was assumed that snout shape is not a reliable phylogenetic character” (Wilburg, 2015).

Good characters, then, are those which do not tend to converge. But how do we tell? This is a difficult question, especially considering…

Woe the Fifth: rooting the trees. As we saw above, character polarity is set by picking an outgroup. Whether we treat traits are basal or derived depends crucially on this decision. In many crocodyliform trees, for instance, the root is sometimes set by Thalattosuchia, an aquatic, crocodile-like critter from the mid-Mesozoic. Probably looked a little something like this:

Notice the long snout? It’s also a longirostrine. So what? Well, consider two hypotheses we might have about the ancestral relationship between Thalattosuchia and the true crocodiles.

By our first hypothesis Thalattosuchia is a sister group of the Mesozoic crocodilian. This hypothesis is more-or-less assumed by taking her as an outgroup, and thus setting character polarity. If it’s true, then somewhere high up on the right of the tree, where the longirostrines gather, the trait evolved. But consider another option…


By this hypothesis, Thalattosuchia is representative of the basal form of the crocodiles: they evolved from something like, and quite closely related, to her. On this hypothesis, notice how polarity vis-à-vis the longirostrine character probably switches. Now the long-snouts are retained, while the short snouts are derived.

It doesn’t seem as if there’s some way of formalizing—making algorithmic—how we go about setting tree-roots. And these roots matter for how our tree turns out. This makes for a handy little argument:

(1)    Setting the right character polarity is necessary for a good phylogeny.

(2)    Character polarity is highly sensitive to outgroup-choice.

(3)    Outgroup choice is arbitrary.

(4)    If character polarity is sensitive to arbitrary decisions, we have no way of determining whether we’ve got it right.

(5)    We don’t know that we have the right character polarity, so we don’t know if we have a good phylogeny. In such circumstances, fine-grained phylogenies are answerless questions.

Okay, so that’s not logically valid as stated, and its also surely too strong - in my next post I'll discuss how outgroup choice can be made in non-arbitrary ways.

But still, in combination, it seems to me, these 5 worries ground some reasonable current-methods-bets against paleobiologists achieving fine-grained, well-resolved, and likely true phylogenies. Particularly in cases where data is thin on the ground and spread across large tracks of time, and where our trees are highly sensitive to decisions such as which traits count as characters, and which outgroups we select. There are debates where palaeontologists seem to be asking questions which we shouldn’t expect to have answered, and I think the crocodilian case is a likely candidate. So, the question looms: should they simply stop bothering with it? I don’t think so, but the why shall have to wait…

*There are researchers who don't think that phylogenies represent hypotheses about life's history. I'm pretty sure they're wrong.