Summary
There have recently been a number of efforts to model
language as a complex adaptive system. A few successful projects have
explicitly modeled evolving language using evolutionary game theory. When
carefully applied, this technique has shown itself able to account for aspects
of language change, and there deserves to be far more testing of this approach. In the simplest kind of model, the speakers
play “language games” in which the objective is to imitate each other. The game
strategies are related by a payoff matrix, and the “imitation dynamics” governs
the evolution in a fashion analogous to the replicator dynamics in biological
models.
Learnability theory (of
language) is the established mathematical study of the learning capabilities of
inductive language learning algorithms.
Such theoretical analysis covers a wide range of learning models, and
can be very helpful in evaluating the effectiveness of postulated language
learning algorithms. But in reality, of
course, language learners are themselves the speakers in the evolving
linguistic community. Language learners
are busy learning an evolving language—in fact they are part of the cause. Mutation
in an evolutionary language game can be modeled as imperfect learning. The
program of research I am putting forward here is to forge a greater connection
between the evolutionary modeling of language and the formal learnability
theory of language. There is precious little existing research on this topic,
but as a practicing linguist I believe we must seek to understand language as
something which is at once inductively learned and evolving under a host of
systemic pressures and functional adaptations.
Only then can we hope to achieve significant understanding of the most
important human cognitive ability.
Language as a complex adaptive system
At present, the efforts to model human language as a “complex
adaptive system” have been developing for at least two decades. Beginning in
the 1990s with some pioneering early work on models of language evolution by
Steels, Croft and other researchers, the idea of modeling language as an
emergent property of a system of agents who are trying to communicate gradually
became more popular, though it is still far from being a mainstream topic in
Linguistics. The currency and modern development of the approach is evinced by
such books as The Origin of Vowel Systems1,
Self-Organization in the Evolution of
Speech2,
and Language as a Complex Adaptive System3.
De Boer1 describes how
language meets the criteria of a complex system: the interacting elements are
the speakers, and the local interactions are the speakers talking to each
other, and also learning language from the speech community. De Boer goes on to
explain how language is also adaptive: it changes under the influence of
cognitive and social forces which seek to optimize a variety of attributes,
viz. communicative efficiency, communicative effectiveness, and ease of
learning.
Steels4 introduced the
simulation of the complex system of language, consisting of a large number of
agents interacting and playing a “language game” designed to foster increased
communicative effectiveness. The
approach is clearly based upon Maynard Smith’s5 evolutionary game
theory. A number of the elements of
natural language have been modeled as emerging from a population playing games:
De Boer (op. cit.) showed how agents playing an “imitation game” with vowel
sounds could spontaneously develop a vowel system bearing similarities to
natural vowel systems; Steels6 simulated
emergence of conceptual categories and linguistic syntax, while Steels and
Kaplan7 simulated
emergence of a lexicon—which is to say, a set of form-meaning relations shared
among the population.
Yet, something is missing from much of this development: a theory
which can both describe and constrain how such complex systems of language learners might function while
nevertheless changing their established language. It seems that the notion of
complex adaptive system has been adopted in much functionally motivated
linguistic research almost as a leitmotif rather than as a serious mathematical
theory which can be used to study the interconnected processes of language
learning and evolution. There are
apparent conundrums in this interconnection; chief among them is that while
human children learn language successfully, language continues to evolve
through the generations.
Evolutionary game theory
Evolutionary game theory was developed by J. Maynard Smith,5 and is well known
as a model of evolving complex systems. In the study of language evolution, it has
been used to formalize the notion of functional adaptation of a language to
meet certain communicative needs.8 Jäger (op. cit.)
used a stochastic evolutionary game simulation to demonstrate that certain
common features of grammar are stable states, while other unattested conditions
are evolutionarily unstable, using just a few uncontroversial facts about
linguistic communication. Despite this
interesting successful research, there has been little or no follow-up. My feeling is that a lot more work deserves
to be pursued in this area. Jäger only
dealt with functional adaptation affecting a couple of grammatical features;
there are numerous other evolutionary forces which drive language change in
other varied ways.
By way of example, let me consider the evolution of vowel
systems. Some pioneering modeling was done by de Boer on the initial evolution of vowel systems (at
the dawn of language), but there has been little or no complex systems modeling
of the continuing evolution of the sounds of language in response to systemic
pressures. There is indeed scarce agreement about what the systemic pressures
affecting vowel systems actually are. For whatever reason, natural languages
most frequently evolve to a point where they have approximately 5 vowel
qualities (usually /i, e, a, o, u/ as in Spanish), and this seems to be an
evolutionarily stable state of our systems—languages with approximately 5
vowels are often evolutionary targets and keep them unchanged for many
centuries (to wit, Spanish). On the
other hand, many languages have for unknown reasons developed vowel systems
with more than 10 distinct vowel qualities—this is characteristic of the
Germanic languages including English. These larger vowel inventories, however,
are usually unstable; English dialects are constantly going through “vowel
shifts” which threaten to render the many global varieties of English mutually
unintelligible. A further fact of interest, however, is that the state in which
a language has a large number of constantly shifting vowels itself appears to
be evolutionarily stable. English has
had at least 10 vowels and diphthongs since Anglo-Saxon times (leaving
long/short distinctions aside) and now has about 13, so rather than reducing
the number of vowels we have cycled through many different qualities of these
vowels in the intervening centuries. This type of vowel shifting is reminiscent of
the stable oscillatory states which have been demonstrated in evolutionary
models of cooperative behavior which include mutation.9
In language evolution, successful imitation plays the part
of the replication found in evolution models (so Maynard Smith’s replicator
dynamics are now imitation dynamics),8 while imperfect
learning by the next generation is the mutation. The above mentioned vowel
shifting is likely to be caused in part by a failure to accurately imitate so
many vowels because of production/perception failures (this is a fitness failure in evolutionary terms),
but also by quasi-random changes in the lexicon that affect the functional load of the vowel contrasts.
Functional load (briefly: the amount of important distinctive work done by a
distinction in a given language)10 is likely to be an
important force in the evolution of language, although along with many other
such forces it has never been modeled in a full-fledged evolutionary game
simulation.
Many other evolutionary phenomena have been postulated for
natural language but have never been subjected to detailed study through
dynamical simulation. One further example is the apparent tendency for the
separate words of sentences to gradually “agglutinate,” so that sequences of
words often become prefix-root or root-suffix combinations (witness the
modern-day creation of English items like gonna
and coulda). The reverse process,
while easy to imagine, is all but unknown in reality.11 So we see that
there are subtle questions of linguistic fitness that need to be carefully
considered to achieve explanatory models.
Formal learning theory
Formal learning or “learnability” theory has recently been
reviewed by Fulop and Chater12, who cover a
number of distinct approaches to the mathematical modeling of learning
functions or languages. A standard model of learning can be used in nearly all
formal learning algorithms: we suppose that the learner receives a data
sequence
—this is the example
set or learning sample or training set—one item at a time. The
data sequence consists of examples. The learner then proposes a hypothesis to
characterize what it has learned after each example. One natural goal for the language learner is to recover (or
perhaps to approximate) the “true” language from which the data D has been generated. In this setting,
an example might consist of a sequence of symbols, plus the information that
this expression is within the target language. A learning sample would then be
a sequence of such sentences.
Learnability theory is traditionally concerned chiefly with
how to set up a problem so whether the “true” function, concept or language is
learnable can be assessed by mathematical analysis. This kind of learning
theory also usually focuses on learning as a process in which the learner’s
hypothesis approaches the target language as more data is analyzed. Key
differences among theoretical frameworks within learning theory center on the
specific way to model the notion of “approaching the target.” For the research
program I envision, it is not essential to select one particular learning
theoretic approach. One could imagine useful results being connected to a
variety of methodologies including Bayesian learning, formalized inductive
inference, and probably approximately correct inference, given that each of
these disparate approaches has yielded important results pertaining to language
learning.
An important finding to emerge from language learnability
studies is that various elements of natural languages can be successfully
learned by a variety of specific algorithms, but only if one allows either
unrealistic computing power13 or tighter
restrictions on the class of languages which can be learned14, 15 (i.e. some form of Universal
Grammar or universal learning bias). The form of innate learning bias that will
serve to permit language learning is not as extensive as that originally sought
within the Chomskyan program of Principles and Parameters, however. The latter
program called for such a rich innate component that
the credo “most of language is innate” has sometimes been attributed to the
Chomskyan paradigm.16
Learning in evolving systems
Seemingly the first source to combine learnability theory
with the study of language as an evolving dynamical system is Niyogi 200617. While impressive and mathematically
sophisticated, this work should be viewed as only a starting point for the
research program I envision. There are
many assumptions made in Niyogi’s approach that deserve reconsideration,
including the language learning paradigm and the dynamical system model. For the former, Niyogi stuck to the linguistic
paradigm of Principles and Parameters, a model that has since fallen out of
favor, in part due to Niyogi’s own proofs18 pointing out that
the learning algorithms did not have the expected nice properties. For
the latter, Niyogi’s dynamical models are not complex systems, rather they are deterministic simple systems with
tractable analytic solutions. While this
enabled the calculation of handy mathematical results instead of messy
simulations, now we must move into the complex systems regime—indeed they
should be not only complex but adaptive
systems as well.
In what is apparently the only literature to add
substantially to Niyogi’s approach, Chatterjee et al.9 provide some
useful methods for combining the study of evolutionary dynamical systems with
the study of learning theory. While
mentioning linguistic applications, their work is focused on populations
learning Prisoner’s Dilemma strategies. My plan is to in essence combine the methods of
Jäger and those of Chatterjee et al., in the search for novel substantive
connections between the complex adaptive system model of language change and
learning-theoretic results about language.
The main theoretical area which requires development is, as pointed out
to me by Nick Chater (p.c.), the scenario in which inductive learners (i.e.
children) are using the “results” of previous learners (i.e. adult speakers)
who presumably have identical language learning biases, and all are together in
the same evolutionary dynamical system.
This should make it possible to learn more easily from finite (and thus
partial) language data. But as
mentioned, there seems to be no published literature which addresses these
points or which does what I’m envisioning for my research program.
Research plans
My plans for carrying out the research involve a number of
interrelated activities and phases. I plan to construct evolutionary language
simulations which model a variety of linguistic forces beyond functional
adaptation, such as cognitive and speech mechanistic constraints. Such simulations will progress to involve
multiple generations of speakers, in which the younger speakers learn language
from the older speakers by imitation. The next phase would add mutation to the
simulation, in the form of imperfect learning.
My hope is eventually to be able to derive language learnability results
in the specific setting of a multigenerational adaptive speech community with
homogeneous learning biases.
To take a specific example, I plan to invoke an existing
method for learning about the morphology of words19 to develop an
evolutionary game simulation in which each succeeding generation of learners
applies the method to the output of the previous generation. Constructing the simulation will involve
carefully considered parameters and entries in the payoff matrix which
determines the outcomes of “games” played by the participants. The overall goal of the game is not only successful
imitation but correct word structure in relation to other similar words, which
can be gauged by a number of possible measures. This will use a stochastic
version of evolutionary game theory, as in Jäger (op. cit.). Once the basic evolving system is set up,
mutation can be introduced and the dynamics examined under different
assignments to fitness parameters.
The basic learning results about this approach to morphology
are quite straightforward19 and should be
applicable within the dynamic systems approach; I imagine that the complexity
of the learning is an interesting object of study in addition to the
learnability per se. The effects of mutation in various forms will surely
affect the learning results; it may become impossible to learn adequately if
imperfection is too great, because a degree of homogeneity needs to be found
among the adult speakers. In general I
am hoping to find some unforeseen results.
References
2. Oudeyer
P-Y. Self-Organization in the Evolution
of Speech. Oxford: Oxford University Press; 2006.
This is an exciting area! Can I suggest you also look at the work on iterated learning? This examines the relationship between inductive bias and stationary distributions mediated by cultural evolution of language. The motivation for this work, since the late nineties, has been the idea that because language persists through cultural transmission, we should expect the data for learners to be optimised for learning - transmission by iterated learning results in adaptation of behaviours to maximise learnability.
ReplyDeleteEarly work looked at simple diffusion chains, but we are increasingly examining more interesting assumptions, for example more complex populations and co-evolution of learning bias and language structure. Most recently, we have been matching simulation models with human experiments. Here's a representative bibliography:
Kirby, S., Dowman, M. and Griffiths, T. (2007) Innateness and culture in the evolution of language. Proceedings of the National Academy of Sciences, 104(12):5241-5245.
Kirby, S. (2001). Spontaneous evolution of linguistic structure-an iterated learning model of the emergence of regularity and irregularity. Evolutionary Computation, IEEE Transactions on, 5(2), 102-110.
Brighton, H., Smith, K., & Kirby, S. (2005). Language as an evolutionary system. Physics of Life Reviews, 2(3), 177-226.
Griffiths, T. L., & Kalish, M. L. (2007). Language evolution by iterated learning with Bayesian agents. Cognitive Science, 31(3), 441-480.
Smith, K., and Kirby, S. (2008). Cultural evolution: implications for understanding the human language faculty and its evolution. Philosophical Transactions of the Royal Society B, 363(1509):3591-3603.
Smith, K. (2009). Iterated learning in populations of Bayesian agents. In N.A. Taatgen & H. van Rijn (Eds.), Proceedings of the 31th Annual Conference of the Cognitive Science Society (pp. 697-702). Austin, TX: Cognitive Science Society.
Burkett, D., & Griffiths, T. L. (2010). Iterated learning of multiple languages from multiple teachers. In The Evolution of Language: Proceedings of the 8th International Conference (EVOLANG8) (pp. 58-65).
Kirby, S., Cornish, H., & Smith, K. (2008). Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105(31), 10681-10686.
Cheers,
Simon
Iterated learning is a fascinating paradigm, but I feel the bottleneck is the learning algorithms that are used, and I am not sure that the generic Bayesian approach will work, since it doesn't seem to work in the non-iterative setting.
ReplyDeleteIs there more recent work that looks at the evolution of more linguistically interesting formalisms like context-free grammars?
The earliest work in this area focussed almost exclusively on CFGs, and there are a range of approaches to iterated learning that looked at exemplar learning, MDL, connectionist approaches etc. The Bayesian approach is useful for characterising the behaviour of iterated learning in the most general terms, I think.
ReplyDeleteKirby, S. (2000). Syntax without natural selection: How compositionality emerges from vocabulary in a population of learners. In Knight, C., editor, The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, pages 303-323. Cambridge University Press.
Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In Briscoe, T., editor, Linguistic Evolution through Language Acquisition: Formal and Computational Models, chapter 6, pages 173-204. Cambridge University Press.
Zuidema, W. (2003) How the poverty of the stimulus solves the poverty of the stimulus. In Suzanna Becker and Sebastian Thrun and Klaus Obermayer, editors, Advances in Neural Information Processing Systems 15 (Proceedings of NIPS'02). Cambridge, MA: MIT Press.
Thanks for the references; I will refresh my memory. embarrassingly I was just in Amsterdam and was talking to Jelle (Zuidema), and I was saying how much I liked that paper ..
ReplyDeleteOK, great to get all these additional references collected on this fascinating topic, and I'm glad that this forum is on the radar of some leading contributors. Thanks Simon for your comments.
ReplyDelete