Thursday, March 14, 2013

Toward the learnability theory of language as a complex adaptive system

Summary

There have recently been a number of efforts to model language as a complex adaptive system. A few successful projects have explicitly modeled evolving language using evolutionary game theory. When carefully applied, this technique has shown itself able to account for aspects of language change, and there deserves to be far more testing of this approach.  In the simplest kind of model, the speakers play “language games” in which the objective is to imitate each other. The game strategies are related by a payoff matrix, and the “imitation dynamics” governs the evolution in a fashion analogous to the replicator dynamics in biological models. 
Learnability theory (of language) is the established mathematical study of the learning capabilities of inductive language learning algorithms.  Such theoretical analysis covers a wide range of learning models, and can be very helpful in evaluating the effectiveness of postulated language learning algorithms.  But in reality, of course, language learners are themselves the speakers in the evolving linguistic community.  Language learners are busy learning an evolving language—in fact they are part of the cause.  Mutation in an evolutionary language game can be modeled as imperfect learning. The program of research I am putting forward here is to forge a greater connection between the evolutionary modeling of language and the formal learnability theory of language. There is precious little existing research on this topic, but as a practicing linguist I believe we must seek to understand language as something which is at once inductively learned and evolving under a host of systemic pressures and functional adaptations.  Only then can we hope to achieve significant understanding of the most important human cognitive ability.

Language as a complex adaptive system

At present, the efforts to model human language as a “complex adaptive system” have been developing for at least two decades. Beginning in the 1990s with some pioneering early work on models of language evolution by Steels, Croft and other researchers, the idea of modeling language as an emergent property of a system of agents who are trying to communicate gradually became more popular, though it is still far from being a mainstream topic in Linguistics. The currency and modern development of the approach is evinced by such books as The Origin of Vowel Systems1, Self-Organization in the Evolution of Speech2, and Language as a Complex Adaptive System3. 
De Boer1 describes how language meets the criteria of a complex system: the interacting elements are the speakers, and the local interactions are the speakers talking to each other, and also learning language from the speech community. De Boer goes on to explain how language is also adaptive: it changes under the influence of cognitive and social forces which seek to optimize a variety of attributes, viz. communicative efficiency, communicative effectiveness, and ease of learning.
Steels4 introduced the simulation of the complex system of language, consisting of a large number of agents interacting and playing a “language game” designed to foster increased communicative effectiveness.  The approach is clearly based upon Maynard Smith’s5 evolutionary game theory.  A number of the elements of natural language have been modeled as emerging from a population playing games: De Boer (op. cit.) showed how agents playing an “imitation game” with vowel sounds could spontaneously develop a vowel system bearing similarities to natural vowel systems; Steels6 simulated emergence of conceptual categories and linguistic syntax, while Steels and Kaplan7 simulated emergence of a lexicon—which is to say, a set of form-meaning relations shared among the population.
Yet, something is missing from much of this development: a theory which can both describe and constrain how such complex systems of language learners might function while nevertheless changing their established language. It seems that the notion of complex adaptive system has been adopted in much functionally motivated linguistic research almost as a leitmotif rather than as a serious mathematical theory which can be used to study the interconnected processes of language learning and evolution.  There are apparent conundrums in this interconnection; chief among them is that while human children learn language successfully, language continues to evolve through the generations.

Evolutionary game theory

Evolutionary game theory was developed by J. Maynard Smith,5 and is well known as a model of evolving complex systems. In the study of language evolution, it has been used to formalize the notion of functional adaptation of a language to meet certain communicative needs.8 Jäger (op. cit.) used a stochastic evolutionary game simulation to demonstrate that certain common features of grammar are stable states, while other unattested conditions are evolutionarily unstable, using just a few uncontroversial facts about linguistic communication.  Despite this interesting successful research, there has been little or no follow-up.  My feeling is that a lot more work deserves to be pursued in this area.  Jäger only dealt with functional adaptation affecting a couple of grammatical features; there are numerous other evolutionary forces which drive language change in other varied ways.
By way of example, let me consider the evolution of vowel systems. Some pioneering modeling was done by de Boer on the initial evolution of vowel systems (at the dawn of language), but there has been little or no complex systems modeling of the continuing evolution of the sounds of language in response to systemic pressures. There is indeed scarce agreement about what the systemic pressures affecting vowel systems actually are. For whatever reason, natural languages most frequently evolve to a point where they have approximately 5 vowel qualities (usually /i, e, a, o, u/ as in Spanish), and this seems to be an evolutionarily stable state of our systems—languages with approximately 5 vowels are often evolutionary targets and keep them unchanged for many centuries (to wit, Spanish).  On the other hand, many languages have for unknown reasons developed vowel systems with more than 10 distinct vowel qualities—this is characteristic of the Germanic languages including English. These larger vowel inventories, however, are usually unstable; English dialects are constantly going through “vowel shifts” which threaten to render the many global varieties of English mutually unintelligible. A further fact of interest, however, is that the state in which a language has a large number of constantly shifting vowels itself appears to be evolutionarily stable.  English has had at least 10 vowels and diphthongs since Anglo-Saxon times (leaving long/short distinctions aside) and now has about 13, so rather than reducing the number of vowels we have cycled through many different qualities of these vowels in the intervening centuries.  This type of vowel shifting is reminiscent of the stable oscillatory states which have been demonstrated in evolutionary models of cooperative behavior which include mutation.9
In language evolution, successful imitation plays the part of the replication found in evolution models (so Maynard Smith’s replicator dynamics are now imitation dynamics),8 while imperfect learning by the next generation is the mutation. The above mentioned vowel shifting is likely to be caused in part by a failure to accurately imitate so many vowels because of production/perception failures (this is a fitness failure in evolutionary terms), but also by quasi-random changes in the lexicon that affect the functional load of the vowel contrasts. Functional load (briefly: the amount of important distinctive work done by a distinction in a given language)10 is likely to be an important force in the evolution of language, although along with many other such forces it has never been modeled in a full-fledged evolutionary game simulation.
Many other evolutionary phenomena have been postulated for natural language but have never been subjected to detailed study through dynamical simulation. One further example is the apparent tendency for the separate words of sentences to gradually “agglutinate,” so that sequences of words often become prefix-root or root-suffix combinations (witness the modern-day creation of English items like gonna and coulda). The reverse process, while easy to imagine, is all but unknown in reality.11 So we see that there are subtle questions of linguistic fitness that need to be carefully considered to achieve explanatory models.
 

Formal learning theory

Formal learning or “learnability” theory has recently been reviewed by Fulop and Chater12, who cover a number of distinct approaches to the mathematical modeling of learning functions or languages. A standard model of learning can be used in nearly all formal learning algorithms: we suppose that the learner receives a data sequence —this is the example set or learning sample or training set—one item at a time. The data sequence consists of examples. The learner then proposes a hypothesis to characterize what it has learned after each example. One natural goal for the language learner is to recover (or perhaps to approximate) the “true” language from which the data D has been generated. In this setting, an example might consist of a sequence of symbols, plus the information that this expression is within the target language. A learning sample would then be a sequence of such sentences.
Learnability theory is traditionally concerned chiefly with how to set up a problem so whether the “true” function, concept or language is learnable can be assessed by mathematical analysis. This kind of learning theory also usually focuses on learning as a process in which the learner’s hypothesis approaches the target language as more data is analyzed. Key differences among theoretical frameworks within learning theory center on the specific way to model the notion of “approaching the target.” For the research program I envision, it is not essential to select one particular learning theoretic approach. One could imagine useful results being connected to a variety of methodologies including Bayesian learning, formalized inductive inference, and probably approximately correct inference, given that each of these disparate approaches has yielded important results pertaining to language learning.
An important finding to emerge from language learnability studies is that various elements of natural languages can be successfully learned by a variety of specific algorithms, but only if one allows either unrealistic computing power13 or tighter restrictions on the class of languages which can be learned14, 15 (i.e. some form of Universal Grammar or universal learning bias). The form of innate learning bias that will serve to permit language learning is not as extensive as that originally sought within the Chomskyan program of Principles and Parameters, however. The latter program called for such a rich innate component that the credo “most of language is innate” has sometimes been attributed to the Chomskyan paradigm.16

Learning in evolving systems

Seemingly the first source to combine learnability theory with the study of language as an evolving dynamical system is Niyogi 200617.  While impressive and mathematically sophisticated, this work should be viewed as only a starting point for the research program I envision.  There are many assumptions made in Niyogi’s approach that deserve reconsideration, including the language learning paradigm and the dynamical system model.  For the former, Niyogi stuck to the linguistic paradigm of Principles and Parameters, a model that has since fallen out of favor, in part due to Niyogi’s own proofs18 pointing out that the learning algorithms did not have the expected nice properties.   For the latter, Niyogi’s dynamical models are not complex systems, rather they are deterministic simple systems with tractable analytic solutions.  While this enabled the calculation of handy mathematical results instead of messy simulations, now we must move into the complex systems regime—indeed they should be not only complex but adaptive systems as well.
In what is apparently the only literature to add substantially to Niyogi’s approach, Chatterjee et al.9 provide some useful methods for combining the study of evolutionary dynamical systems with the study of learning theory.  While mentioning linguistic applications, their work is focused on populations learning Prisoner’s Dilemma strategies.  My plan is to in essence combine the methods of Jäger and those of Chatterjee et al., in the search for novel substantive connections between the complex adaptive system model of language change and learning-theoretic results about language.  The main theoretical area which requires development is, as pointed out to me by Nick Chater (p.c.), the scenario in which inductive learners (i.e. children) are using the “results” of previous learners (i.e. adult speakers) who presumably have identical language learning biases, and all are together in the same evolutionary dynamical system.  This should make it possible to learn more easily from finite (and thus partial) language data.  But as mentioned, there seems to be no published literature which addresses these points or which does what I’m envisioning for my research program.

Research plans

My plans for carrying out the research involve a number of interrelated activities and phases. I plan to construct evolutionary language simulations which model a variety of linguistic forces beyond functional adaptation, such as cognitive and speech mechanistic constraints.   Such simulations will progress to involve multiple generations of speakers, in which the younger speakers learn language from the older speakers by imitation. The next phase would add mutation to the simulation, in the form of imperfect learning.  My hope is eventually to be able to derive language learnability results in the specific setting of a multigenerational adaptive speech community with homogeneous learning biases.
To take a specific example, I plan to invoke an existing method for learning about the morphology of words19 to develop an evolutionary game simulation in which each succeeding generation of learners applies the method to the output of the previous generation.  Constructing the simulation will involve carefully considered parameters and entries in the payoff matrix which determines the outcomes of “games” played by the participants.  The overall goal of the game is not only successful imitation but correct word structure in relation to other similar words, which can be gauged by a number of possible measures. This will use a stochastic version of evolutionary game theory, as in Jäger (op. cit.).  Once the basic evolving system is set up, mutation can be introduced and the dynamics examined under different assignments to fitness parameters.
The basic learning results about this approach to morphology are quite straightforward19 and should be applicable within the dynamic systems approach; I imagine that the complexity of the learning is an interesting object of study in addition to the learnability per se. The effects of mutation in various forms will surely affect the learning results; it may become impossible to learn adequately if imperfection is too great, because a degree of homogeneity needs to be found among the adult speakers.  In general I am hoping to find some unforeseen results.     

References


5 comments:

  1. This is an exciting area! Can I suggest you also look at the work on iterated learning? This examines the relationship between inductive bias and stationary distributions mediated by cultural evolution of language. The motivation for this work, since the late nineties, has been the idea that because language persists through cultural transmission, we should expect the data for learners to be optimised for learning - transmission by iterated learning results in adaptation of behaviours to maximise learnability.

    Early work looked at simple diffusion chains, but we are increasingly examining more interesting assumptions, for example more complex populations and co-evolution of learning bias and language structure. Most recently, we have been matching simulation models with human experiments. Here's a representative bibliography:


    Kirby, S., Dowman, M. and Griffiths, T. (2007) Innateness and culture in the evolution of language. Proceedings of the National Academy of Sciences, 104(12):5241-5245.

    Kirby, S. (2001). Spontaneous evolution of linguistic structure-an iterated learning model of the emergence of regularity and irregularity. Evolutionary Computation, IEEE Transactions on, 5(2), 102-110.

    Brighton, H., Smith, K., & Kirby, S. (2005). Language as an evolutionary system. Physics of Life Reviews, 2(3), 177-226.

    Griffiths, T. L., & Kalish, M. L. (2007). Language evolution by iterated learning with Bayesian agents. Cognitive Science, 31(3), 441-480.

    Smith, K., and Kirby, S. (2008). Cultural evolution: implications for understanding the human language faculty and its evolution. Philosophical Transactions of the Royal Society B, 363(1509):3591-3603.

    Smith, K. (2009). Iterated learning in populations of Bayesian agents. In N.A. Taatgen & H. van Rijn (Eds.), Proceedings of the 31th Annual Conference of the Cognitive Science Society (pp. 697-702). Austin, TX: Cognitive Science Society.

    Burkett, D., & Griffiths, T. L. (2010). Iterated learning of multiple languages from multiple teachers. In The Evolution of Language: Proceedings of the 8th International Conference (EVOLANG8) (pp. 58-65).

    Kirby, S., Cornish, H., & Smith, K. (2008). Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105(31), 10681-10686.

    Cheers,
    Simon

    ReplyDelete
  2. Iterated learning is a fascinating paradigm, but I feel the bottleneck is the learning algorithms that are used, and I am not sure that the generic Bayesian approach will work, since it doesn't seem to work in the non-iterative setting.
    Is there more recent work that looks at the evolution of more linguistically interesting formalisms like context-free grammars?

    ReplyDelete
  3. The earliest work in this area focussed almost exclusively on CFGs, and there are a range of approaches to iterated learning that looked at exemplar learning, MDL, connectionist approaches etc. The Bayesian approach is useful for characterising the behaviour of iterated learning in the most general terms, I think.

    Kirby, S. (2000). Syntax without natural selection: How compositionality emerges from vocabulary in a population of learners. In Knight, C., editor, The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, pages 303-323. Cambridge University Press.

    Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In Briscoe, T., editor, Linguistic Evolution through Language Acquisition: Formal and Computational Models, chapter 6, pages 173-204. Cambridge University Press.

    Zuidema, W. (2003) How the poverty of the stimulus solves the poverty of the stimulus. In Suzanna Becker and Sebastian Thrun and Klaus Obermayer, editors, Advances in Neural Information Processing Systems 15 (Proceedings of NIPS'02). Cambridge, MA: MIT Press.

    ReplyDelete
  4. Thanks for the references; I will refresh my memory. embarrassingly I was just in Amsterdam and was talking to Jelle (Zuidema), and I was saying how much I liked that paper ..

    ReplyDelete
  5. OK, great to get all these additional references collected on this fascinating topic, and I'm glad that this forum is on the radar of some leading contributors. Thanks Simon for your comments.

    ReplyDelete