Stable Isotopes in Unstable Times: Harold Urey’s paleothermometer and the nature of proxy measurement

* Joe Wilson is a Lecturer in the Philosophy Department at University of Colorado Boulder. His research concerns philosophical issues in historical and climate science. He writes…

Exhausted in the wake of World War II, isotope chemist Harold Urey had to make good on a pre-war commitment: giving the Liversidge Lecture before the Chemical Society of the Royal Institution. Just over a decade earlier Urey had been awarded the Nobel Prize in Chemistry for his discovery of deuterium (the second stable isotope of hydrogen). Having established himself as an expert in isotope separation methods, Urey went on to head Columbia University’s uranium isotope separation program (the Substitute Alloys Materials Laboratory, or SAM Lab) from 1940–3 as part of the Manhattan Project. For the duration of the program, Urey was relegated to supervising more than a thousand technicians and engineers across multiple universities. The stress of managing a project of this scale, along with the pressure of winning the war in Europe and digesting the moral repercussions of the atom bomb, left Urey disenchanted with isotope separation and on the brink of nervous breakdown.

Harold Urey at his desk in the late 1940s

The Liversidge Lecture in December 1946 was then, for Urey, a chance to return to more standard forms of academic research. In the lecture, Urey revisited his earlier work on isotope thermodynamics with L.J. Greiff. Now primarily concerned with how isotopic properties could be exploited in nature, Urey dedicated much of the lecture to the discussion of natural isotope fractionation processes. 

Among the methods he discussed was one focusing on the fractionation of oxygen isotopes in marine carbonate-precipitating organisms. It turns out that single-celled foraminifera incorporate oxygen from the surrounding water into their calcite (CaCO3) shells. Stable isotopes of oxygen naturally occur with atomic masses of 16, 17, and 18, and as the ambient temperature decreases, more of the heavy isotope (O18) is incorporated at the expense of the lighter varieties. Eventually, the organisms producing these shells die, and many of their shells find their way to the sea floor, where they are buried and preserved beneath further sediment deposits. 

A selection of benthic foraminifera

Using the new mass spectrometers produced during the war, Urey suggested that, so long as the base isotopic composition of the water could be estimated, historic temperature changes in our oceans could be determined using these calcite shells. Urey thus put a spotlight on the theoretical links between isotope fractionation and temperature, identifying a feature of the historical record that scientists could exploit to reconstruct temperatures in the past.

At the same time that Urey was preparing to return to civilian research, he spoke out on the serious threat posed by atomic weapons. His postwar speeches advocated for the consolidation of international entities like the United Nations into a world government, which would take as its model the international scientific community. However, just three months after the Liversidge Lecture, the Soviet Union rejected the US-proposed Baruch Plan, calling for (among other things) international exchange of scientific information, the elimination of atomic weapons, and the formation of an atomic development authority. Urey continued in his anti-nuclear efforts, though he soon lost faith in both the prospects of a world government and in the Soviet’s ability to responsibly utilize atomic technologies.

As this geopolitical drama was unfolding, Urey and his collaborators continued to refine their oxygen paleothermometer. Samuel Epstein, a research fellow in Urey’s group, led a study aimed at calibrating the carbonate-oxygen isotope temperature scale (Epstein et al 1951) and establishing the quantitative relationship (or calibration function) between the abundance of oxygen-18 and temperature [temp = 14.8 – 5.41*delta-O18]. The paleothermometer was becoming more precise, if not necessarily more accurate. Then, in a pioneering study, a research associate in Urey’s lab named Cesare Emiliani collected carbonate samples from a number of deep-sea cores spanning the globe, extracted their isotopic ratios with a mass spectrometer, and applied the paleothermometer to reconstruct Pleistocene ocean temperatures (Emiliani 1955). Using this approach, Emiliani concluded that equatorial ocean temperatures varied by as much as 6° C throughout past glacial cycles. (This is a lot. For context, the most drastic estimates of anthropogenic warming, now considered unlikely, topped out at around 5° C by 2100.)

Epstein’s plot of oxygen concentrations versus temperature

* * *

Applying recent model-based perspectives in the philosophy of measurement, we can understand Epstein et al.’s calibration function as a first attempt to develop a model of the measurement process for ocean paleotemperature (e.g., Tal 2017, Mari et al 2017, Wilson and Boudinot 2022). On this view, measurement is a kind of modeling activity whereby workers capture the link between a measurement output and the desired target phenomenon by building an explicit model of the measurement process. In the present case, Urey and colleagues sought to identify the function relating the relative abundance of O18 in the calcite shells of foraminifera and the temperature of water in which the shell formed.

According to Eran Tal (2017), measurement outcomes function as predictors for relevant patterns in the data. In the context of the paleothermometer, we first observe a systematic relationship (what Tal calls a “forward function”) between the temperature in which a shell forms and the oxygen isotope ratios in the shell. We then invert this relationship (resulting in an “inverse function”) to generate predictions about temperature from the observed fractionation of oxygen. This inverse function provides the calibration curve necessary for oxygen isotopes to predict, and therefore measure, marine temperature.

That is one account, anyway, in which the link between a measurement outcome and a target phenomenon is established on the basis of a measurement model. By contrast, Mari et al. (2017) defend a “progressive” account of measurement that more directly appeals to the role of theory and background knowledge in establishing this link. According to Mari et al., a model of the measurement process is the last step in a process that begins with the identification of a measurement task, goes on to construct a general model from a set of laws that relate the measurement target to other phenomena, and finally incorporates idealizations and approximations into a model for a specific target system. On this view we might understand Urey, in the Liversidge lecture, to be identifying the measurement task (in the context of known thermodynamic regularities pertaining to oxygen isotopes) as well as the promise that analytical tools (mass spectrometers) could measure such isotopes. It remained for later work to construct a model and adopt the requisite idealizations in their calibration of the oxygen paleothermometer for marine organisms.

* * *

Not long after Epstein’s isotopic analysis, scientific communications between the East and West resumed with the International Geophysical Year of 1957–8. The IGY was a period during which the international scientific community promoted geoscientific research, and which saw the Soviet Union and the United States launch their first artificial satellites. In addition, over 70 countries rushed to perform Antarctic research. In this period of Antarctic research, the Soviet Union established Vostok Station near the magnetic south pole. The location chosen for Vostok Station made it particularly suitable for magnetometric studies on Earth’s magnetosphere, while also accommodating the standard suite of arctic research: astronomy, geophysics, radiometry, and climatology.

Postage stamp commemorating the International Geophysical Year of 1957–8

Part of the climatological work carried out at research stations like Vostok involved the drilling of ice-cores, from which samples of meteoric ice were later extracted for analysis. These cores, when carefully extracted, provided a continuous record of deposits spanning thousands of years. Such records include aerosols like dust and pollen, atmospheric air bubbles trapped within the matrix of the core, and the meteoric ice comprising the bulk of the core itself. One benefit of ice cores over their sediment-core counterparts is that their temporal resolutions are often much greater than corresponding deep-sea ocean cores. (See the sediment study of Hays et al. (1976) for comparison, with cores coming in at 3 cm/1 kyr. The Vostok Core represents accumulation rates on the order of 1 cm/yr over a similar 400 kyr period.)

Initial drilling at Vostok Station began in 1958, with efforts resulting in four relatively shallow boreholes and all nine makeshift thermal drills trapped in the ice. If the unsatisfactory drilling wasn’t enough to dampen the spirits of Soviet scientists, the Soviet Union would go on to ban chess at all Antarctic research stations after one scientist attacked his opponent with an ice pick after losing a game.

The Vostok Expedition, 1957

While Soviet authorities were confiscating chessboards, complications were turning up for the oxygen paleothermometer. Urey had acknowledged that the reliable application of the paleothermometer turned on the reliability with which workers could identify the isotopic composition of sea water. Now, studies were beginning to show that baseline oxygen levels in sea water were dependent on the workings of the water cycle. In particular, evaporation disproportionately selects lighter isotopes of oxygen while condensation disproportionately selects heavier ones. So, as atmospheric circulation directs water vapor toward the poles, that vapor becomes lighter as falling temperatures preferentially draw down heavier isotopes, leaving only the lightest water for incorporation into polar ice deposits.

What this means is that the polar ice caps are disproportionately formed of light water, leaving sea water isotopically heavier than it would otherwise be. But ice caps are not stable things: they grow and shrink and sometimes disappear altogether. It follows that the isotopic composition of sea water not only varies with temperature, but also varies significantly with global ice volume as the climate oscillates between glacial cycles. The British geologist Nicholas Shackleton (1967) used this insight to critique Emiliani’s estimates of Pleistocene warming (6° C). According to Shackleton, much of the warming Emiliani detected in the isotope signal was just isotopically light polar ice returning the ocean.

* * *

The original model of the measurement process had been disrupted. For temperature to predict isotopic fractionation (à la Tal), the model must accommodate the variation in the isotopic baseline that results from the accumulation of ice. But the original model neglected to incorporate the necessary laws or regularities linking global ice volume to baseline oxygen isotope ratios. A significant confound in the oxygen paleothermometer had therefore been left out of the calibration function.

Unfortunately, there are no means by which scientists can physically control the confounding influence that global ice exerts on the paleothermometer. While traditional mercury thermometers can be constructed in such a way that variance in atmospheric pressure has no influence on the height of the mercury column (by sealing and calibrating the instrument at a fixed pressure, for example), no similar operation can be performed on past sea ice and foram shells. The record of isotopic variations is set in stone, or perhaps, in ice or ooze. Anyway, it cannot be directly controlled.

Instead, scientists use other means to independently assess how much variation in the signal represents a response to ice volume change. Subsequently, this variation can be “vicariously controlled” by applying post-hoc correction methods to extract the influence from temperature estimates. Climate researcher Garrett Boudinot and I have argued that vicarious control is the distinctive feature of proxy (vs. non-proxy) measurements like the oxygen paleothermometer (Wilson and Boudinot 2022). The degree to which a measurement is a proxy measurement corresponds to the degree to which vicarious controls are required to control for known confounding causes.*

[* Identifying proxy measurement with vicarious control is an improvement on existing accounts that emphasize the “indirectness” of proxies, since paradigm cases of standard measurement themselves exhibit indirectness. Mercury thermometers, for example, only provide access to temperature via more direct access to the height of the mercury column.]

It is interesting to observe that the discovery of an ice volume confound to the oxygen thermometer (and the resulting refinements of the calibration function) produced another thermometric application of oxygen isotopes. If heavier isotopes of oxygen disproportionately rain out as water vapor cools, then the concentration of light isotopes at the poles indicates cooler ambient temperatures. If, by contrast, heavier isotopes survive the trek to the poles, temperatures must have been warmer at that time. As such, ice cores composed of precipitate (“meteoric ice”) can also provide a proxy measure for surface air temperatures at the poles. (See Achermann (2020) for more on the history and epistemic implications of this “vertical” glaciology.)

* * *

Serious Antarctic drilling didn’t recommence at Vostok Station until 1970, with the development of a proper ice-drilling system. What followed was almost thirty years of continuous drilling and the establishment of five major boreholes. Borehole 1 (1970–1973) and Borehole 2 (1971–1976) were drilled concurrently, with the latter reaching a depth of 450 meters, and the former nearly 950 meters. Problems at Borehole 2 resulted in research leading to improved fluid-drilling techniques. Using this knowledge, Borehole 3 (1980-1983) and Borehole 4 (1981-1989) more than doubled the depths achieved with Borehole 1, terminating at 2201 and 2546 meters, respectively. 

Scientists extracting a core at Vostok Station

With the collapse of the Soviet Union, coring research at Vostok Station became a collaborative effort between Russia, France, and the United States. This culminated in January 1998 with a core reaching 3623 meters and terminating in the refrozen waters of the subglacial Lake Vostok. Altogether, the Vostok ice core boasted a continuous record of meteoric ice spanning the last 420,000 years: the deepest core ever recovered at that time. As the first continuous ice core to span multiple glacial-interglacial cycles, it enabled scientists to track changes in atmospheric carbon dioxide (from trapped air bubbles) and temperature (oxygen isotope analysis) across multiple glacial transitions. It is against this backdrop that contemporary carbon levels are so alarming: modern atmospheric CO2 values are greater than anything known from the last 400,000 years of Earth history (420 ppm vs. ~300 ppm).

While a far cry from Urey’s vision of an international scientific union with political teeth sharp enough to enforce nuclear nonproliferation, the extraction of The Vostok Ice Core was an international achievement shared by France and the two primary competitors of the Cold War. I like to think that Urey would have been happy with this reconciliation of scientists formerly separated by the Iron Curtain. (He died in 1981, just as drilling commenced on Borehole 4.) It is partly because of Urey’s venture into geohistory that climate scientists were armed with the analytic techniques required to tackle the 400,000 years of ice.

* * *

If the reliability of a proxy measurements like the oxygen isotope thermometer turns on the ability to vicariously control for confounding causal factors (influences on the measurement outcome beyond the target phenomenon), then historical scientists are put in a tough spot regarding the interference of time. A proxy from the time of the dinosaurs has at least 66 million years of potential confounds to vicariously control, 66 million years of natural processes introducing noise into the signal. It might initially be surprising that there are any functional historical proxy measures at all.

In response to such concerns, historical scientists choose their proxies wisely. Reliable proxies are developed using traces that are abundant and well-preserved. Both the sediment and ice composing the bulk of a core, and the samples buried within, are shielded from the most destructive processes bent on destroying their signal. Foraminifera exist in countless numbers in the ocean, producing an abundance of calcite shells of which some portion is buried and preserved in sediment (a greater portion in coastal regions, where sedimentation rates are high). Meteoric ice is buried beneath the continued accumulation of further meteoric ice, which compresses it into ice samples that become insulated from disruptive surface-level processes, meteorological or otherwise. Tree rings are a similar sort of record. The internal structure of the tree is the product of annual growth cycles in which warmer and wetter conditions produce greater growth in warmer seasons, with the external bark providing a nice layer of protection.

About 20 cm of an ice core containing 11 annual layers. The darker layers represent winters

In addition, proxy methods are developed in ways that piggyback on significant dependencies in nature. There is no in principle reason why measurements must rely on significant dependencies between the target phenomenon and a measurement output. One could use global pCO2 averages over time as a measure for how much CO2 a particular country is emitting, just so long as the contribution of all other countries could somehow be tightly constrained. But this would be difficult and bound to introduce uncertainties; so proxy methods are grounded in what are thought to be significant causal relationships. Urey’s insight regarding the oxygen paleothermometer involved noticing that temperature is a significant influence on oxygen isotope fractionation such that the variability of oxygen isotope fractionation in a given context is largely a function of temperature (relative to known confounds). (See Wilson and Boudinot (2022) for a more technical discussion of causal significance.)

All proxy methods carry a certain cost stemming from the inability to directly control confounding factors. Still, the reliability of a given proxy doesn’t ultimately depend on the method of control. Measurement reliability, proxy or otherwise, is a matter of how well confounds are controlled. As such, proxy measurements are not in principle less reliable than non-proxy measures, even if they may require more sophisticated and varied strategies for controlling confounds. Indeed, the oxygen paleothermometer has continued to be refined over the years, long after the initial worry over ice volume, with novel confounds being discovered and incorporated into the measurement model (e.g., isotopic variation with shell-size, foraminiferal lifecycle, and ocean pH levels). As a result, the oxygen isotope paleothermometer has become one of the most reliable and commonly used proxy measures in paleoclimatology.

* * *

I will conclude by considering one last limitation on the use of historical proxy measures. Some of the most concerning challenges for interpreting proxy measures emerge from their temporal resolution. Aja Watkins, on this very blog, grapples with the philosophical problem of how to settle on rates derived from proxy measurements, how to compare modern rates derived from modern instruments, and whether there are even such things as “real” rates.

I will restrict the rest of my discussion to one specific rate-related issue: the rate at which proxy records accumulate, and thus the temporal resolution of the target measure. This is often very low relative to climate phenomena we observe in the present (the Vostok Core averaged ~1.4 cm/year as it accumulated, while Hays’ continuous ocean sediment cores averaged ~3 cm/kyr). So, even if we could decide what rate(s) to ascribe to existing proxy records, many of these records would underdetermine the climatologically significant processes that we know to occur on shorter timescales. Individual historical proxy records can, at best, average the known climatological variance occurring over shorter timescales. (To be clear, non-proxy measures experience the same kind of temporal underdetermination: a standard mercury thermometer requires some number of seconds to respond to local temperature changes, and so cannot tell us about variance on the nanosecond scale. For climate purposes, however, it turns out that the behavior of the global climate over time can for most purposes be adequately represented in terms of seconds or longer time units. This is not the case for the timescales captured by historical proxy measures.)

The problem is a general one. Tree rings capture seasonal temperature variation in the growth patterns of their rings, but fail to capture temperature variance occurring at daily or weekly intervals. Ice cores like Vostok can exhibit sufficient resolution for annual temperature averages but obscure intra-annual seasonality in deeper sections. Our oldest continuous ocean sediment cores are resolved closer to millennial timescales, and so average together several hundred years of temperature signal. What empirical constraints a proxy measure is capable of providing will be a function not only of the amount of time represented in the record and our ability to vicariously control confounds, but also the temporal resolution of the record.

There are a couple things we can say about how historical proxy users work with such constraints. First, proxy measures of differing resolutions will be particularly suited to assessing hypotheses at differing timescales. Deep sea sediment cores experience slower accumulation rates that make them more suitable for assessing variance on the order of 10–100kyrs, like the periodicity of earth’s orbit around the sun. On the other hand, tree rings grow and coastal sediments accumulate relatively quickly, making them suitable for tracking more recent variation in El-Niño Southern Oscillation (on the order of 4–10-year cycles). In fact, studying climate change rarely requires anything more fine-grained than annual temporal resolution, so we shouldn’t worry about the lack of an hourly paleothermometer. Underdetermination need not be a problem so long as the proxies are used for temporally appropriate purposes.

Secondly, individual proxy records best contribute to our understanding of the more complex earth system in the context of other proxy measures and independent background theory. Climate simulation models, for example, can provide a useful venue for the integration of empirical observations and relevant dynamic principles into a more complex and coherent vision of the past. Wendy Parker (2017) argues that climate simulation models can even play an important role in facilitating measurement practices. In this more interdependent empirical context, proxy measures of differing temporal resolution can provide distinct empirical constraints for the model’s behavior. Thus, while a single proxy method will rarely provide a richly detailed image of the past on its own, they provide crucial empirical constraints, which work alongside our other epistemic considerations to produce a more richly detailed understanding of the past.

So while it may be common to speak of traces in the historical record as providing a kind of “snapshot” of the past, it would be a mistake to import the temporal precision of a typical photograph into the analogy. Instead, it would be better to understand the analogical photograph as a product of longer exposure, no longer depicting so “snappy” a moment in time. The lines and shapes of the photograph may thus blend and blur, capturing the motion within the frame better than the boundaries of the subjects themselves. Yet the trained eye may still be capable of interpreting the patterns. In developing such a long exposure tool, Urey and colleagues provided an important way to interpret these motions of the past.

References

Achermann, D. 2020. Vertical glaciology: The second discovery of the third dimension in climate research. Centaurus 62:720–743.

Emiliani, C. 1955. Pleistocene temperatures. The Journal of Geology 63:538–578.

Epstein, S., Buchsbaum, R., Lowenstam, H., and Urey, H.C. 1951. Carbonate-water isotopic temperature scale. Geological Society of America Bulletin 62:417–426.

Hays, J. D., Imbrie, J., and Shackleton, N.J. 1976. Variations in the Earth's orbit: pacemaker of the ice ages. Science 194:1121–1132.

Mari, L., Carbone, P., Giordani, A., and Petri, D. 2017. A structural interpretation of measurement and some related epistemological issues. Studies in History and Philosophy of Science, Part A. 65:46–56.

Norton, S., and Suppe, F. 2001. Why atmospheric modeling is good science. In C.A. Miller and P.N. Edwards, eds., Changing the atmosphere: Expert knowledge and environmental governance, 67–105.

Parker, W.S. 2017. Computer simulation, measurement, and data assimilation. The British Journal for the Philosophy of Science 68:273–304.

Shackleton, N. 1967. Oxygen isotope analyses and Pleistocene temperatures re-assessed. Nature 215:15–17.

Shindell, M. 2019. The Life and Science of Harold C. Urey. Chicago: University of Chicago Press.

Tal, E. 2017. Calibration: modelling the measurement process. Studies in History and Philosophy of Science, Part A. 65:33–45.

Urey, H.C. 1947. The thermodynamic properties of isotopic substances. Journal of the Chemical Society (Resumed) 562–581.

Vasiliev, N.I., Talalay, P.G., Bobin, N.E., Chistyakov, V.K., Zubkov, V.M., et al. 2007. Deep drilling at Vostok station, Antarctica: history and recent events. Annals of Glaciology 47:10–23.

Wilson, J., and Boudinot, F.G. 2022. Proxy measurement in paleoclimatology. European Journal for Philosophy of Science 12:1–20.