Systematics of Star Wars

It is a period of great confusion. Human scientists, obsessed with understanding the history of life, debate their methods for reconstructing EVOLUTIONARY TREES.

During the conflict, a group of devoted fans wonder how these methods might be applied to the aliens of the STAR WARS film series.

Inspired by the release of a new film, one apprentice paleontologist looks for insight into evolution in the Star Wars galaxy, unaware of the difficulties in that project....


I paid my first visit to the Star Wars galaxy in the summer of 1985, when Return of the Jedi was first re-released to theaters. My enthusiasm for the franchise was settled half an hour into the movie, when Luke Skywalker faces off against the gigantic rancor monster. That one scene seemed explicitly designed to capture my young imagination. Action! Suspense! A big scary-looking thing that vaguely reminded me of dinosaurs!

My beloved not-quite-a-dinosaur. Image courtesy Wookieepedia.

My beloved not-quite-a-dinosaur. Image courtesy Wookieepedia.

It wasn't long before I consumed all of Star Wars content available at the time, introducing me to the series' seemingly endless menagerie of fascinating aliens. Among them was the reptilian bounty hunter Bossk, another lifeform that promised an intersection between the two tracks in my mind (i.e., Star Wars and dinosaurs).

The bounty hunter Bossk, exemplar of the Trandoshan species. Image courtesy Wookieepedia.

The bounty hunter Bossk, exemplar of the Trandoshan species. Image courtesy Wookieepedia.

Could the rancor and Bossk be related? My young mind often imagined the planet that could produce both of my favorite aliens: it was a paradise of unearthly quasi-paleontological delights on which dino-spacepeople kept dino-spacepets. Three decades later I revisited the question, scientific tools in hand, and hoped that some cold, hard quantitative analysis would give me reason to continue dreaming. If nothing else, generating an evolutionary tree of Star Wars aliens should be a fun exercise.

The test of my rancor-Bossk hypothesis turned out to be a saga of frustration, disappointment, and (ultimately) enlightenment. My data yielded few empirical conclusions, but did offer some conceptual insight into the value of paleontology.

I've divided this essay into two sections. The first is a discussion of a priori issues in phylogenetic analysis, including the role of paleontology in data-driven reconstructions of evolutionary history. The second is a discussion of the a posteriori inferences that can be made about the phylogeny of Star Wars aliens.

Strap yourselves in!

A priori considerations: Prequels to data analysis

Episode I: The Polytomy Menace

A good evolutionary tree, or phylogeny, is one that definitively answers the question, "what is this taxon's closest relation?" for every taxon on the tree. Any such tree is said to be "resolved." By contrast, unresolved trees include polytomies: groups of taxa whose closest relations are ambiguous between a number of options. Polytomies are bad.

Imagine my despair, then, when the initial result of my analysis was a massive polytomy:

FIGURE 1: Consensus parsimony tree.

Phylogenies can be reconstructed through a variety of methods, most of which share some basic principles. The most common methods search through the collection of all possible trees--"tree space"--for the best ones, where "best" is defined by reference to some criterion or another. The phylogeny above, for example, is the result of a parsimony-based analysis: a program calculates the overall number of evolutionary changes on the tree and searches for trees with the lowest number of changes. A tree is resolved in this case when a preponderance of parsimonious trees all have the same relations between taxa; a polytomy results when several trees with different relations between taxa are equally parsimonious. In the case above, a sampling of 1,000 phylogenies yielded 40 equally parsimonious trees, and those trees could only resolve around a quarter of the taxa in the dataset. The consensus among those trees, then, is ambiguous.

One of my earliest posts on this blog considered the potential problems with parsimony-based analyses. When faced with the ambiguous consensus tree above, then, I tried other criteria for searching through tree space. Likelihood-based analyses, for example, uses a specified model to determine which evolutionary changes are more or less likely and searches for trees that include the likeliest changes. A Bayesian analysis, for example, determines its model a posteriori by looking through the dataset for the frequency of changes to determine which are most difficult. Using the MrBayes program to search through 100,000,000 possibilities, the following consensus tree was the result:

FIGURE 2: Results of Bayesian analysis.

Well, crap.

Something clearly wasn't working out. What was it about the Star Wars aliens that so defied systematic classification?

Episode II: Attack of the conceptual difficulties

The most obvious explanation for the aliens' stubborn resistance to analysis is the simple fact that Star Wars aliens never actually evolved. One can take this literally: the aliens were designed by artists who hoped to evoke a viewer's awe, rather than fitted to their environments by natural selection. One can also turn to an in-context interpretation: perhaps the aliens of the Star Wars galaxy are not all descended from a single common ancestor, as an evolutionary tree implies. The phylogenetic models that I used both assumed otherwise.

In the end, however, that shouldn't make too much of a difference: after all, phylogenetic reconstruction can systematically arrange plenty of things that didn't evolve (fairy tales or Pokemon, to cite two examples). Phylogenetic analysis of Star Wars aliens isn't inappropriate per se; rather, something had to be wrong with my data. Either the taxa that I selected weren't sufficiently diverse or the traits I measured weren't sufficiently distinct from one another.

Among the taxa considered, there's no a priori reason to think they were insufficiently diverse: one would hope that even the most dense of analytical programs could sort out humans from Gungans or tauntauns or Hutts. That left me to consider the traits analyzed.

What is a good trait in phylogenetic analysis? This is another question I've considered here before. In that earlier essay, I suggested that the philosophical concept of properties could help in finding an answer: traits with identical causal histories and powers would be homologous. That determination is easier to make when we have developmental data. If the horns of the tauntaun and the horns of the Gamorrean are both results of the same developmental pathways, then it would be likely that the two are homologous. Unfortunately, developmental data doesn't show up very well on film.

Episode III: Revenge of the paleontologist

As I sorted through the mess of likely trees in an effort to find something worth writing about, I noticed something striking. Of the forty equally parsimonious trees generated by the parsimony analysis, a majority of them--25 out of the 40--simply couldn't be true in the context of the Star Wars storyline.

Star Wars canon holds that two important locales in the storyline--the planets Hoth and Endor--had been largely untouched by the greater galactic civilization prior to their appearances in the Star Wars films. It should stand to reason, then, that taxa indigenous to those planets--tauntauns and wampas on Hoth, ewoks and goraxes on Endor--should be more closely related to each other than they are to others. Fifteen trees captured this relation, and the consensus among those trees yielded significantly more resolution.

FIGURE 3: Constrained consensus tree (see text for explanation).

Natural history provides an important context in which we can draw more precise inferences about evolutionary history. This much has long been obvious to paleontologists. The most significant problem with my data, then, was a contextual one: the Star Wars galaxy is one that lacks the insights that paleontology and related disciplines provide.

There are several roles that paleontology can play in phylogenetic reconstruction. One is empirical: fossils yield data to be analyzed. Another is conceptual: the geological and ecological contexts in which fossils are found enable theorists to sort good empirical inferences from bad ones. This latter sense is one in which paleontological data is used to calibrate molecular clocks or to generate theories that explain discrepancies between fossil and molecular data, as in the debate over eutherian mammal origins.

This distinction recalls another made by ancient Greek philosophers such as Aristotle, who highlighted the difference between praxis (the knowledge of the practical means by which one can obtain one's goals) and phronesis (understanding of the difference between good means and bad ones). Similarly, Kant's distinction between hypothetical imperatives--the course of action that obtain preferred goals--and the categorical imperative--the principle by which we ought to choose right actions--tracks the different roles that paleontology can fulfill. One the one hand, paleontological data can be a means to the end of making inferences about evolutionary relations. On the other hand, paleontological data can yield principles according to which we choose some phylogenetic inferences (e.g., tauntauns and wampas are closely related) and reject others (e.g., ewoks are closely related to a morphologically similar taxon with which they share no natural history). Apart from giving us the means to make inferences about evolutionary history, the value that paleontology and the historical sciences alone can provide is the wisdom to recognize which inferences we should or shouldn't accept.

With all that said, what can we actually learn about the phylogeny of Star Wars aliens? Let's now turn to the data.

A posteriori results: The heart of the story

Episode IV: A New Dataset

My dataset included 45 taxa and 81 characters. One principle guided my choices in constructing the dataset: stick to the screen.

One could spend decades poring over Star Wars ephemera in search of detailed descriptions of that galaxy's many alien lifeforms--I know because that pretty well describes my free time in the past thirty years. I therefore limited myself to taxa that had significant screen time in Star Wars films or TV shows. I therefore considered the following taxa (in alphabetical order): Abednedo, Acklay, Aqualish, Bantha, Bith, Boar Wolf, Boga, Dug, Eopie, Ewok, Exogorth, Gamorrean, Geonosian, Gorax, Gungan, Happabore, Human, Hutt, Iktotchi, Ithorian, Jawa, Kaminoan, Mon Calamari, Mynock, Neimoidian, Nexu, Porg, Quarren, Rancor, RathtarReek, Rodian, Sarlaac, Steelpecker, Sullustan, Talz, Tauntaun, Togruta, Toydarian, Trandoshan (including Bossk), Twi'lek, Wampa, Wookiee, Yoda's species, and Zabrak.

Traits had to be observable from screen appearances or production art. Discrete traits--those that could be described either as present or absent--were preferred. These included articulated jaws, human-like (presumably cartilaginous) noses, eyestalks, opposable digits, fur, radial symmetry, teeth, vacuum adaptations, etc. Twenty-one of the eighty-one traits were continuous traits, or those that could take a number of variable forms. Continuous traits included measures of quantity (number of horns, number of forelimb digits, etc.), length (e.g., phalanx length relative to metacarpal length), or morphological variations on functionally equivalent anatomic structures (earlobe shape, tooth shape, etc.). All continuous traits had to be justifiable by the functional equivalence of anatomical structure.

(I will happily provide my dataset to anyone interested in obtaining it; please send me an email for the most secure method of file transfer.)

Episode V: The empirical results strike back

For dataset construction, I used Mesquite v. 3.31 for Mac OSX to generate NEXUS and TNT files. As noted above, I used TNT for parsimony analysis and MrBayes for Bayesian analysis. TNT settings were changed to accommodate 1000 random addition sequences using the Ratchet option; figure 1 above shows the strict consensus of 40 equally parsimonious trees. MrBayes settings were left at default.

All analyses set the taxon "Rathtar" as the outgroup, given just how weird rathtars are relative to other Star Wars aliens. This criterion may seem subjective, but may be upheld by the fact that rathtars are the only Star Wars aliens that exhibit radial symmetry rather than bilateral symmetry.

Results of these analyses are reproduced in figures 1 and 2 above. For tree visualization I used the Interactive Tree of Life website. Neither analysis resolved more than 20% of taxa considered, thus limiting the conclusions that could be drawn.

I added two criteria (see section "Episode III" above): first, that wampas and tauntauns must form a monophyletic clade; second, that ewoks and goraxes must form another monophyletic clade. Given these stipulations, I searched through the forty trees preserved in TNT and eliminated twenty-five, creating consensus tree from the remaining fifteen (reproduced above as figure 3). That tree left around 20% of taxa unresolved, which marked a significant improvement upon the initial analysis. (A lack of time and familiarity with software precluded analysis of multiple trees in MrBayes.)

Episode VI: Return of the Philosopher

While both analyses resulted in significant ambiguity, with 20% or fewer of all taxa resolved in both cases, the data provided were sufficient to reject my initial hypothesis that rancors and Trandoshans form a monophyletic clade. Parsimony analysis shows that rancors are more closely related to happabores than anything else; Bayesian analysis left the relation between rancors and Trandoshans unresolved. Thus were my childhood dreams dashed.

Focusing on the most-resolved consensus tree, Star Wars fans should find plenty to debate. One the one hand, some of the results meet commonsense expectations: to wit, most of the unresolved taxa are those portrayed on-screen by humans in costumes. But who would have guessed that Yoda, the universally-beloved tiny Jedi master, has his closest phylogenetic relation in the much-reviled ewoks and the gigantic predatory gorax? Or that the head-tail-adorned Twi'leks are not very closely related to other taxa with similar cranial appendages (e.g., the Togruta or Iktotchi)?

I'm willing to grant that some of these surprising results might be attributable to my current state of education: I'm a scientific padawan, after all, with at least a year to go before I attain the rank of knight. I have internalized one fundamental practice that should be shared by all scientists, however: I welcome the opportunity for debate and potential disproof. So let's get that debate going in the comments below!