Sequences are History

The rate-independent pattern of a sequence can describe or guide some rate-dependent behavior in the world, but this is not the only way in which sequences have a peculiar relationship with time. Sequences of DNA are essential to biological evolution, and evolution implies change over time. As Hull notes, “There is something about the structure of a particular molecule of DNA which depends on the sequences of selection processes which gave rise to it.”17 In other words, sequences contain a record of their own history.

Linguists Morten Christiansen and Simon Kirby call the study of the origin of language “the hardest problem in science,”18 although they might get some pushback on that claim from those biologists who study the origin of life.19 Explaining the origin of complex systems of sequences, whether genomes or texts, is a hard problem indeed. Fortunately, researchers studying the origin of life and those studying the origin of language are studying what is tantamount to the same problem.

This leads us to the second temporal difference between sequences and the physical world. Rate independence also allows sequences to display unusual properties in historical time. “In its evolutionary role the gene [sequencej inhabits eternity, or at least geological time,” writes Richard Dawkins. “Its companions in the river of evolutionary time are other genes, and the fact that in any one generation they inhabit individual bodies can almost be forgotten.”20 The one-dimensional patterns of sequences are more than rate-independent. They are, in principle, eternal.

What Dawkins says about genes is also true of the sequences of language. As individuals, you and I have learned to listen, to speak, to read, and to write, but the systems of linguistic sequences we have mastered were here before we were born and will be here after we die. In the river of time, the fact that these sequences have inhabited our individual bodies can almost be forgotten.

What, then, is the difference between the history of a system of sequences and the history of an ordinary physical system? The chief distinction is that the dynamic laws of physics are reversible in time. With precise enough measurements, we can not only predict the state of a physical system at any time in the future but also infer its state at any time in the past. Astronomers can tell you when solar eclipses will take place in the future, as well as when and where they took place years, centuries, or millennia ago.

Not so with sequences. In 1965, biologist Emile Zuckerkandl and Nobel laureate biochemist Linus Pauling wrote a paper called “Molecules as Documents of Evolutionary History.” “Of all natural systems,” they say, “living matter is the one which, in the face of great transformations, preserves inscribed in its organization the largest amount of its own past history.”21 In other words, the one-dimensional patterns of sequences provide a partial record of how they evolved. “The genealogical history of an organism,” says Carl Woese, “is written to one extent or another into the sequences of each of its genes.”22

There is a great gap between the complex systems of linguistic and genetic sequences we observe today and everything else in the physical world.23 Contemporary systems of sequences must have evolved from less elaborate precursors, but there is little evidence remaining of what those precursors were like, or of the specific steps involved in getting from those precursors to where we are today. In English, and in the genome, it appears that every part is necessary for the proper functioning of the system, and yet we know that at some earlier time not all parts were present. As linguist Derek Bickerton says, “Language must have evolved out of some prior system, and yet there does not seem to be any such system out of which it could have evolved.”24

Systems of sequences are complex and interdependent. Complexity makes modeling difficult, and interdependence magnifies the difficulty because we cannot determine which elements are necessary and which are contingent. Also, no intermediate forms survive. No proto-languages are spoken anywhere and no proto-organisms have been collected by scientists. The languages of contemporary hunter-gatherers and the genetics of ancient lineages of bacteria are already complex.

Researchers have developed software to analyze patterns in sequences that yield clues to their history.23 These computational tools demonstrate the convergence of two independent strains of sequence studies, historical linguistics and molecular evolutionary genetics. Etymologists studying English analyze patterns in the language to tell us which words have a Germanic pedigree and which a Romance, and molecular biologists analyze patterns of DNA to tell us how much of our genome originated with Neanderthals. Linguists Quentin Atkinson and Russell Gray put it this way: “Researchers using computational methods in evolutionary biology and historical linguistics aim to answer similar questions and hence face similar challenges.”26 This is because they are studying equivalent problems.27

Molecular biologists developed their computational tools to probe the masses of data originating from genome sequencing projects. Evolutionary biologists adapted these tools to address problems like the relatedness of different species or the nature of the common ancestor of all life.28 Linguists adapted the same tools to address questions like the relatedness of modern languages or the nature of extinct languages like Proto-Indo-European.29

Two phylogenetic trees show the convergence of sequence analysis methods in evolutionary biology and historical linguistics

FIGURE 1.1 Two phylogenetic trees show the convergence of sequence analysis methods in evolutionary biology and historical linguistics. The tree on the left shows the evolutionary relationships among the DNA sequences of the domestic dog and its relatives, including which are most closely related and how far back in time they diverged. '" The tree on the right shows the relationships among the sequences of Indo-European languages, including which are most closely related and how far back in time they diverged.31

One thing we have learned from these analyses is that, in both linguistic and biological evolution, the patterns of commonly used sequences tend to be conserved; in Dawkins’s river of evolutionary time they are unusually persistent. The cell contains common genes and short DNA sequences for which the fundamental arrangement has remained largely unchanged for hundreds of millions of years. By the same token, the most frequently used words in a language retain their forms longer than those that are less common. In English, for example, only three percent of modern verbs are irregular, but of the ten most common verbs, all are irregular.32

Using the historical record to describe the previous states of a system of sequences is not a lawful process but rather a statistical one. The tools of computational biology cannot provide precise answers to historical questions, but only estimates within a range of probabilities. Further, none of these software tools is of use in trying to explain the history of an ordinary physical system because physical systems are governed only by the laws of nature. The laws of nature may retain no explicit record of their history, but their reversibility allows us to calculate their previous states if we so desire.

Predicting the future of sequential systems is also probabilistic. The trajectories of physical systems can be predicted with great precision; this is how we land space probes on comets. But no matter how much behavioral detail we accumulate, we cannot forecast the direction of either biological or cultural evolution.