Data, Text & Simulation: Alisa Bokulich’s "Using models to correct data: paleodiversity and the fossil record."

In this post, Derek Turner and Adrian Currie take a critical look at Alisa Bokulich’s recent paper Using models to correct data: paleodiversity and the fossil record.

There’s nothing like a simple distinction: take that between ‘theory’ and ‘data’, for instance. Scientists carry out experiments or make observations, and this generates data—reports of the results of those investigations. Scientists also theorize and build models about the world. Theories explain the data, and the data tests the theories. Easy as pie.

Simple distinctions can be helpful, but are often misleading. ‘Data’ is a highly theoretical notion (Sabina Leonelli captures this very well): generating data requires theoretical knowledge about how to collect it, store it, manage its use across different contexts, and so forth. We need to work out what is signal and what is noise, understand how to enable data’s integration, use in evidential reasoning, and travel into new contexts, without losing critical information about the source and context of the data’s generation. The naïve picture whereby empirical data straightforwardly tests theories is hopelessly misguided.

Well fine – we’ve known that for a long time. Back in 1962 Patrick Suppes introduced the notion of a ‘data model’, a theory-mediated representation of data. In 1988 the Jims Bogen and Woodward argued that data don’t test theories, they instead provide the basis for inferring the properties of the natural phenomena which theories attempt to explain. And as has been explained in countless undergraduate essays, a major problem with naïve (not Popper’s, we might add) versions of falsificationism is their failure to account for the ‘auxiliary hypotheses’ required to connect test conditions to the hypothesis under test.

But this in itself doesn’t tell us about the epistemology of data models: when developing a model of data, which features are good-making and which are bad-making? That is, what are the virtues and vices of data-modeling?

Alisa Bokulich emphasizes the role of models in correcting data. This is a very common practice which, nonetheless, might seem dodgy initially. As she says,

“The intuition here might be that any “model-tampered” data is in fact “corrupted” data… [but] this intuition is mistaken. It is not the ‘giveness’ of data that makes it epistemically privileged, but rather its degree of fidelity, and the fidelity of data can be improved by removing artefactual elements and reducing noise.”

Such worries about the use of modelling to correct data assume that ‘purity’ is a virtue of data modelling. That is to say, the worries only hold water if we think data is good when it is, as it were, ‘raw’. But as we saw above data is never raw. It always needs cleaning up. It just isn’t true that the more similar a data set is to the circumstances of its original recording the better it is: the immediate records include an enormous amount of noise. In demonstrating this, Bokulich argues that fidelity is the good-making property we should attach to data, and that models are critical for generating this fidelity and—as her title implies—she does so by looking at the relationship between the fossil record and paleodiversity.

Paleontologists care a lot about paleodiversity: roughly the number of species, genera, families (or whatever) there are across evolutionary time. Understanding paleodiversity helps study the nature of speciation and extinction, the relationship between biological evolution and other major events, and the causes and connections between, say, biological novelty and radiations. For palaeontologists, patterns in paleodiversity are a gold mine for answering Big Questions about life at grand scales. But to get paleodiversity, palaeontologists need to analyse the fossil record. And the fossil record is infamously patchy.

What to do with a patchy record? Following friend-of-the-blog David Sepkoski’s excellent history, Bokulich traces how palaeontologists have, on the one hand, become increasingly good at quantitatively understanding the fossil record, and on the other hand, learned how to mitigate those biases.

Bokulich outlines several methods that researchers use to correct for biases in the fossil data: the method of residuals, subsampling methods, and phylogenetic methods. Here it is easy to get lost in the technical weeds, but consider a simple example involving phylogenetic correction. Suppose we have two rock layers, an older and a younger one. The younger layer contains fossils from two species, A and B. The older layer only has fossils from species A. So the “raw” or “pure” or unprocessed diversity data are clear: you have one species at the earlier time, and then two later on. But what if A and B are related? Phylogenetic reconstruction might tell us that A and B both evolved from some earlier common ancestor. If so, then the lineage leading to B must have existed earlier as well. So we can correct the data by adding in that extra lineage at the earlier time.

Bokulich’s position turns on the difference between purity and fidelity. Purity can be understood in terms of how processed the data is. Unprocessed data is the raw, uncut realness—or so a naïve understanding might have it. Fidelity is how truly the data represents the natural phenomena we are trying to represent and study. Due to, in part, the messiness of the fossil record, extremely pure fossil data sets will be extremely unreliable. They’ll bring with them all the biases and mess of the fossil record. Following an increasingly common turn to localness about the epistemic power of models typically attributed to Wendy Parker, Bokulich also argues that fidelity itself is not one thing. There are many epistemic tasks and targets, and we can’t optimise our data for all of them. Good data is, then, not simply high-fidelity, but high fidelity-for-a-purpose.

Finally, Bokulich connects her claims with Caitlin Wylie’s (another friend-of-the-blog) discussion of epistemic issues related to fossil preparation (now forthcoming!). Where Bokulich analyses ‘data-to-data-model’ inferences, Wylie points out that in preparing fossils—required for them to be data in first place—fossil preperators split rock from fossil. Bokulich suggests that fossil preparation, like data-modelling, depends upon fidelity-for-a-purpose:

“… a fossil specimen should be prepared only to the extent to which it is adequate to provide the requisite evidence for the paleontologist’s specific theoretical questions.”

She takes this to suggest a much more pervasive role for modelling generally speaking—it is (just about) models all the way down.

“At almost every level of the data model hierarchy—from the datum of the individual prepared fossil specimen up to the most sophisticated phylogenetically-corrected global fossil data set—involves the use of models.”

Both of us admire Bokulich’s paper and think she on to something important with her account of how paleobiologists use data models. But we each have some philosophical questions about her account.

Derek writes . . .

One thing about Alisa Bokulich’s fabulous paper that really jumps out at me is how committed she is to the idea that the fossil record is like text. I’ve argued (here) that this textual metaphor—one whose theological origins have faded from most people’s awareness—strongly influences how we think about fossils. Bokulich’s central claim is that scientists have learned not to take the fossil record—their data—at face value, but to use models to correct the data in various ways. I think she’s right about this, but it also strikes me that this makes paleobiology look a lot like efforts to reconstruct the historical origins of the Bible.

Even those who think the Bible is (in some sense) the word of God agree that the document had human authors. Treating the text as data, what inferences can we draw about the historical origins of the text? A “face value” reading of, say, the Pentateuch (Genesis, Exodus, Leviticus, Deuteronomy, and Numbers) might treat them as a history written by a single author—say, Moses. In the nineteenth century, philologists challenged this naïve reading of the Pentateuch by developing the so-called “documentary hypothesis,” according to which there were actually four different authors, living in different places at different times: the Yahwist, the Elohist, the Priestly author, and the Deuteronomist. Each of these authors has slightly different tells – for example, they refer to God in different ways, or differentially emphasize events happening in different locations. Of course, these “authors” are just theoretical posits—the unobservable entities of historical Biblical scholarship, as it were. The basic idea is that the Pentateuch is the result of later editors splicing together four distinct texts, by four authors. Implicit in this suggestion is the possibility that portions of the original texts could well have been lost via the editorial process. The Bible, like the fossil record, could be “gappy.”

The documentary hypothesis (along with the various other more complex accounts that have developed in the meantime) looks a lot like a “corrected” reading of the scriptural text. Both the naïve and the “corrected” readings treat the text as providing evidence concerning its author(s) and editor(s). You can actually find versions of the Biblical text with different verses highlighted according to the author to which they are attributed. (Here is one nice example.) You might even think of this as a data model of the Bible. Biblical scholarship makes progress by devising increasingly sophisticated methods for correcting the data.

I think there might be some interesting parallels between this historical-critical research on the Bible and paleontologists’ efforts to “correct” the fossil data. Consider Bokulich’s discussion of the method of residuals. There the goal is to try to separate the biological from the geological contributions to the “raw” paleodiversity data. For example, if sedimentary rock volume declines with age, that could mean that diversity increase is merely a geological signal: it looks like there are more species in recent times, but that is only because there is more rock! The effort to tease apart the geological vs. the biological contributions to the fossil record does not seem all that different from scholars’ efforts to figure out whether the text of a particular chapter of Genesis is more attributable to the Yahwist vs. the Priestly author.

This comparison between paleontology and Biblical scholarship might seem surprising, but note that all I am really doing is taking the textual metaphor (i.e. the idea that fossils comprise a “record” that can be “read”) and working backwards. If the crust of the earth is like a text—a completely ordinary thought—then maybe the scriptural text is like rock strata.

But fossils are not (literally) a text. If we choose to think about them that way, the metaphor naturally invites certain sorts of questions. The metaphor has proven to be quite generative, leading scientists to think about new ways of “reading” the crust of the earth. But metaphors also hem in our thinking in various ways that can be difficult to see. Increasingly, I find myself wondering what other ways there might be to think about fossils. If the paleobiological revolution was a series of efforts to reread the fossil record, it also marked a kind of doubling down on the textual metaphor. Could there be other ways of thinking about fossils? Even while scientists seek increasingly sophisticated readings of the fossil record, we philosophers might seek alternative ways of conceptualizing what the science is about.

Adrian writes…

I’d like to start with a shout-out to Derek’s contribution. One thing I find fascinating about the connection between interpreting the fossil record and textual (particularly biblical) interpretation is that it has such a long history in paleontology. The early modern natural philosopher Robert Hooke’s very early work on fossils drew an explicit parallel with ‘chronologies’—the practice of inferring history and dates by interpreting the bible in combination with other texts. For Hooke, Bible chronology was the explicit model for what we could do with fossils (see Martin Rudwick on this). The analogy Derek highlights, then, matters for the beginnings of paleontological science, and this makes his challenge—rethinking what fossils might be beyond the textual metaphor—all the more compelling.

So I’ve two little discussions on offer. First, I think Bokulich gets it wrong when she emphasizes ‘fidelity-for-a-purpose’; second, well… (self-embarrassed philosophical sigh) I’m not sure what Bokulich means by ‘model’. Let’s take these in turn.

‘Fidelity’, I take it, implies that a representation is ‘true enough’, or perhaps ‘true enough of some target’. I’d understand this as a dependency between the source of the data on the one hand, and the data model on the other. A high-fidelity data model will track the right features of the data’s source in virtue of the data-model’s possession of those features turning on how the measurements of the data’s source turned out. Bokulich’s appeal to fidelity-for-a-purpose explicitly appeals to evidence. But it is worth pointing out that the purposes of data modelling are not exhausted by evidence. We see this clearly, I think, in the fossil preparation analogy.

A fossil preperator is guided by several goals related to the future purposes the fossil will be put to. First, this is not simply fidelity for a single evidential purpose, but many evidential purposes. In deciding when a particular fossil is ‘finished’, the preperator doesn’t typically have in mind just one evidential purpose. Especially if it is a particularly rare fossil, it is likely to be used in many analyses, towards a variety of aims. And what counts as good fidelity will differ for these different uses. As such, the fossil-preperator has to find a balance between these. As opposed to ‘fidelity-for-a-purpose’, then, I think often (but not always) a good-making feature is ‘fidelity-for-expected-purposes’, or maybe ‘balanced-fidelity’ where a good balance is struck between both expected uses and between future, unanticipated uses. Alison Wylie has some fascinating work on the use of legacy data in archaeology, and I think this matters critically for understanding the nature of fidelity as a virtue in paleontological data as well. More generally, one of the driving ideas behind Sabina Leonelli’s view on the nature of data science is that data-bases are for a variety of purposes—some unanticipated—and this plays a critical role in how data journeys are facilitated by their curators. As such, it is not fidelity for any particular purpose that we’re after.

But even this misses that there are non-evidential purposes at play as well. Fossil preperators often have both archive and display in mind, and these have differing needs. Museum display emphasizes both aesthetic and pedagogical uses, archivists care about longevity. Further, as Caitlin Wylie herself emphasizes, fossil preperators have their own aesthetic judgements about what counts as a ‘completed’ fossil prep.

This amounts to two claims: first, I don’t think (at least in the fossil-prep case) that ‘fidelity-for-a-purpose’ is in fact a virtue of this kind modelling practice; second, I don’t think that ‘fidelity’ is the only virtue. I see these as correctives rather than massive objections to Bokulich—I don’t think she makes any explicit statements about monism or pluralism regarding what makes for a good data model, and I see my suggested shift to ‘expected purposes’ to be close to her original point. I think this retains the spirit, if not the letter, of the important points she makes.

I do worry, however, about the connection Bokulich makes between data modelling and fossil preparation, and it is the kind of worry I usually hate and try to avoid, but it might have some teeth here. What is the worry? Well, what does Bokulich mean by model or modelling? (gah, I’m reminded of the philosopher during seminar question time, clutching their head like it is about to erupt in consternation, uttering I just don’t know what you mean).

To see why this worry might have teeth, I’ll quickly sketch an account of modelling I quite like. In the mid-2000’s both Michael Weisberg and Peter Godfrey-Smith gave us an account of modelling which fundamentally ties it to a kind of strategy scientists adopt. Coarsely speaking, someone who isn’t a modeller starts with empirical data: their approach to understanding a phenomenon is to observe it, measure it, isolate and experiment upon it, and so forth. On this approach, we build our way up to a theoretical understanding via the collection of data. A modeller, on the other hand, doesn’t start with data, but rather with a kind of proxy or analogue: the modeller looks at something else, develops an understanding of phenomena like that, and then later compares it with the natural system. Crucially, on this account, what makes something a model is not the content of the theory, but rather the process through which the theory was come to. No doubt there are limitations to the distinction and many scientific practices involve combinations of both, but I think the distinction gives us substantive purchase when it comes to understanding what is going on with model-based science.

Bokulich’s example, and her appeal to fossil preparation, suggest she has something very different in mind by ‘model’. Data-models are a critical part of practices which are not modelling by Godfrey-Smith and Weisberg’s view. Why? Because they are intimately involved in the processes of collecting and representing empirical data. This is certainly not in itself an objection—I don’t think the ‘model-as-strategy’ approach has priority over others—but… I worry. The simulations used in generating fossil phylogenies, and the techniques and aesthetic/epistemic judgements that are used by fossil preperators, are super different. The former is formalized, computational, and has been verified and validated via various theoretical and empirical routes; the latter is idiosyncratic, physically laborious and highly tacit. I take it they are similar in terms of their role: both are involved in splitting signal from noise in order to generate data. But how much explanatory purchase do we get by lumping such practices together? If ‘modelling all the way down’ really just means ‘theoretical judgements are required at each stage of the process of generating empirical data’ then I’d be the last to disagree, but isn’t this just the received wisdom with which we opened our discussion of Bokulich? Perhaps I can put this much more positively.

Bokulich’s suggestion that fossil prep and data-models share functions and virtues is a potentially very fruitful one, but to see how fruitful it is, we’d need to look in more detail at the practices of each: what judgements do they make, why are the practices organized and designed as they are? Perhaps as opposed to a firm conclusion, then, I’d rather see Bokulich’s link between fossil preparation and simulations in paleontological systematics as a fruitful philosophical hypothesis. My bet is this will turn out to be a productive hypothesis indeed.