At some point in this forum I think I posted about my old work on Whole Word Morphology. Next month I am attending the International Symposium on Artificial Intelligence and Mathematics, and speaking about this approach in a special session on Mathematics of Natural Language Processing. I think that it may be useful for NLP in a variety of languages. Here is the abstract:
Whole Word Morphology does away with morphemes, instead representing all morphology as relations among sets of words, which we call lexical correspondences. This paper presents a more formal treatment of Whole Word Morphology than has been previously published, demonstrating how the morphological relations are mediated by unification with sequence variables. Examples from English are presented, as well as Eskimo, the latter providing an example of a highly complex polysynthetic lexicon. The lexical correspondences of Eskimo are operative through their
interconnection in a network using a symmetric and an asymmetric relation. Finally, a learning algorithm for deriving lexical correspondences from an annotated lexicon is presented.
Link to the paper at the ISAIM website
This research program fits with my general theme of learning language through unification procedures, which I think is both computationally useful and cognitively relevant. It seems to me that the cognitive version of unification is "analogical learning."
Friday, December 20, 2013
Friday, October 4, 2013
Contextual grammars and term-labeled trees
I've recently discovered the work of Solomon Marcus, a brilliant mathematician who's been publishing in mathematical linguistics since the 1950s. He worked to develop "contextual grammars," which derive from the American Structuralist ideas that the "set of contexts" determined by a word in a given language is an important characteristic of the word, perhaps the most important. In a survey paper that appeared in the Handbook of Formal Languages (Rozenberg and Salomaa 1997), Marcus explains how the sets of contexts determined by the words in a language and the sets of words appearing in the contexts are related by a Galois connection. He cites Sestier (1960) for first showing this.
My problem with this whole framework is that the contexts are defined as string contexts only--the strings that occur before and after a word in a sentence form the context. I have worked to get beyond this string-based model of language syntax. In my own work I proposed term-labeled trees, which are sentences provided with an immediate constituent analysis (a bare tree) together with a semantic term label, usually a lambda term. Then I proposed, ignorant of contextual grammars at the time, the notion of a term-labeled tree context. This is the term-labeled sentence tree with two holes in it, corresponding to the locations of the meaning term and the linked syntactic item (possibly a word, maybe a subtree) that would fill the context.
Drawing on the result of Sestier, it appears that in a term-labeled tree language the term-labeled tree contexts are still related to the words and other possible subtrees by a Galois connection. I would have to take some more steps to prove it, but it seems right at a glance. This is a nice mathematical connection, and it would be great if it continues to hold in what I regard as the improved variation on contextual grammars.
My problem with this whole framework is that the contexts are defined as string contexts only--the strings that occur before and after a word in a sentence form the context. I have worked to get beyond this string-based model of language syntax. In my own work I proposed term-labeled trees, which are sentences provided with an immediate constituent analysis (a bare tree) together with a semantic term label, usually a lambda term. Then I proposed, ignorant of contextual grammars at the time, the notion of a term-labeled tree context. This is the term-labeled sentence tree with two holes in it, corresponding to the locations of the meaning term and the linked syntactic item (possibly a word, maybe a subtree) that would fill the context.
Drawing on the result of Sestier, it appears that in a term-labeled tree language the term-labeled tree contexts are still related to the words and other possible subtrees by a Galois connection. I would have to take some more steps to prove it, but it seems right at a glance. This is a nice mathematical connection, and it would be great if it continues to hold in what I regard as the improved variation on contextual grammars.
Friday, August 30, 2013
Learning biases in constraint-based grammar learning
In my previous post I highlighted a new topic in TopiCS. Here I will offer a few remarks on one of the papers, "Cognitive Biases, Linguistics Universals, and Constraint-Based Grammar Learning" by Culbertson, Smolensky, and Wilson.
The broad goals of this paper are (i) to exemplify the argument that human language learning is facilitated by learning biases, and (ii) to model some specific biases probabilistically in a Bayesian fashion. Let me say firstly that I am very sympathetic to both of these general ideas. But, I think that this project is very narrowly applicable only to the framework of optimality-theoretic syntax, and that is in no way fleshed out enough to generate an entire language.
So, without going into too many details, I think the paper's result applies only to a tiny corner of a grammar, in particular the part that derives "nominal word order" within noun phrases involving a numeral + noun on one hand, and an adjective + noun on the other. I'm not sure how to react to a result that only derives a couple of expressions from a whole language. I agree there might be Bayesian probability at work in grammar learning, but a project like this really needs to be worked out on a grammatical system capable, at least in theory, of deriving an entire language. I don't know if that capability has been shown for the kind of optimality-theoretic syntax under discussion here. I do know there are about 50 frameworks waiting in the wings that are known to be able to generate real languages, at least in theory if not in practice. Maybe we should try applying some of the ideas from this paper to fully fledged grammatical frameworks, instead of a half-baked one (sorry if that is a mixed metaphor!).
The broad goals of this paper are (i) to exemplify the argument that human language learning is facilitated by learning biases, and (ii) to model some specific biases probabilistically in a Bayesian fashion. Let me say firstly that I am very sympathetic to both of these general ideas. But, I think that this project is very narrowly applicable only to the framework of optimality-theoretic syntax, and that is in no way fleshed out enough to generate an entire language.
So, without going into too many details, I think the paper's result applies only to a tiny corner of a grammar, in particular the part that derives "nominal word order" within noun phrases involving a numeral + noun on one hand, and an adjective + noun on the other. I'm not sure how to react to a result that only derives a couple of expressions from a whole language. I agree there might be Bayesian probability at work in grammar learning, but a project like this really needs to be worked out on a grammatical system capable, at least in theory, of deriving an entire language. I don't know if that capability has been shown for the kind of optimality-theoretic syntax under discussion here. I do know there are about 50 frameworks waiting in the wings that are known to be able to generate real languages, at least in theory if not in practice. Maybe we should try applying some of the ideas from this paper to fully fledged grammatical frameworks, instead of a half-baked one (sorry if that is a mixed metaphor!).
Sunday, August 18, 2013
Cognitive Modeling and Computational Linguistics
It has been some years since I quit my membership in the Association for Computational Linguistics. I quit because, in simplest terms, I wasn't getting much out of the membership and I was not encouraged by the direction of the field of Comp Ling. My impressions from about 2000 through 2008 were that Comp Ling was getting more and more "engineering" oriented, and more and more hostile to any other purpose for computational modeling. I have a few anecdotes I could tell about my own papers on that score; one appeared in JoLLI after referees for Computational Linguistics suggested it be sent there, since it had no "practical application." (Being naive at the time, I did not realize that every paper in Computational Linguistics had to have a practical application.)
A new topic which appeared in the July issue of Topics in Cognitive Science gives some hope for a different future. Here one finds 11 papers under Computational Models of Natural Language, edited by John Hale and David Reitter. The overarching theme is basically computational psycholinguistics relaunched. The papers include many which I would like to comment on here in later posts. They were presented at the first workshop on Cognitive Modeling and Computational Linguistics, held at the ACL meeting in 2010. This workshop has since been reprised in the succeeding years, so it seems that this is not a one-time aberration. The notion of using computational linguistics to investigate linguistic theory was purged from the ACL (especially the North American chapter) before I finally quit. I'm glad to see this research avenue explored under the auspices of this Association once again.
A new topic which appeared in the July issue of Topics in Cognitive Science gives some hope for a different future. Here one finds 11 papers under Computational Models of Natural Language, edited by John Hale and David Reitter. The overarching theme is basically computational psycholinguistics relaunched. The papers include many which I would like to comment on here in later posts. They were presented at the first workshop on Cognitive Modeling and Computational Linguistics, held at the ACL meeting in 2010. This workshop has since been reprised in the succeeding years, so it seems that this is not a one-time aberration. The notion of using computational linguistics to investigate linguistic theory was purged from the ACL (especially the North American chapter) before I finally quit. I'm glad to see this research avenue explored under the auspices of this Association once again.
Sunday, May 12, 2013
The Eurasiatic sharpshooter fallacy?
A paper by Mark Pagel, Quentin Atkinson, Andreea Calude and Andrew Meade has the linguistic blogosphere buzzing, so I thought I'd contribute my own entry. Their paper "Ultraconserved words point to deep language ancestry across Eurasia" was published in Proceedings of the National Academy of Science ahead of print, and has already been the subject of fierce criticism from the ultra-doctrinaire community of comparative historical linguistics. The paper applies an intriguing statistical procedure to the LWED database of reconstructed proto-words in seven established language families, and purports to uncover 23 lexical items which unite the families into a Eurasiatic superfamily, much as was proposed many years ago by Joe Greenberg (among others.)
I'm not going to contribute a detailed critique of the paper here; I will note that the critique that was posted on Language Log by Sally Thomason includes the caveat that she is not qualified to judge the statistical procedures. I think that if one is going to critique a scholarly paper, it should really be critiqued in its entirety and not just in bits and bites, but some may differ on that score.
I think two major criticisms have emerged from the various comments, which are "garbage in, garbage out" and the "Texas sharpshooter fallacy." The second one (raised by Andrew McKenzie of the University of Kansas) is more interesting to me, since it actually involves the statistical interpretation. This statistical fallacy involves "discovering hidden structure" or clusters in data where there is really no evidence for anything. It takes its name from the tale of a Texas gun for hire who was not a very good shot. Being clever, he took out his two revolvers and fired 12 shots as best he could at the side of a barn, and then painted a target centered on the tightest cluster of bullet holes. He then showed the target to potential clients, claiming to be a sharpshooter.
In the Eurasiatic data, I guess the problem could be that the 23 "ultraconserved" lexical items which were found to unite the families could just be randomly like each other, but it is hard for me to draw this analogy with the Texas sharpshooter because the statistical results in the paper are so significant they seem to mitigate problems of this kind. For one thing, there are 7 language families involved and not just two. For another, the 23 lexical items emerge from the typical 200-word Swadesh list comparison. Without any rigorous argument, it seems to me that there is a very low chance of 23 items out of 200 (that's 12.5%) randomly being similar across 7 language families. A commonly cited real instance of a scientific study waylaid by the Texas sharpshooter was a Swedish epidemiological study of 800 medical conditions. They found a significant difference in the incidence of one ailment out of 800, among people who lived near electric transmission lines (this is cited on the Wikipedia page about the Texas sharpshooter). This result is now regarded as not reproducible, an instance of the Texas sharpshooter. But let's take note that 1 ailment out of 800 is quite different from 23 words out of 188.
Quentin Atkinson assured me that he can stand behind this paper, and he may yet have to defend it in the pages of PNAS or some similar platform. These authors are not going to make a clean getaway with such a provocative proposal, not once the anti-mass comparison folks in comparative linguistics got wind of it. My own view in general is that we should embrace new sources of evidence in linguistics, rather than closing ranks and saying that methods developed over a century ago are really the only way. Let's not forget that the standard comparative method is so strict that it can be carried out "by a trained eye," and without any statistical processing. Surely there must be some kind of computational analysis that can go beyond this.
I'm not going to contribute a detailed critique of the paper here; I will note that the critique that was posted on Language Log by Sally Thomason includes the caveat that she is not qualified to judge the statistical procedures. I think that if one is going to critique a scholarly paper, it should really be critiqued in its entirety and not just in bits and bites, but some may differ on that score.
I think two major criticisms have emerged from the various comments, which are "garbage in, garbage out" and the "Texas sharpshooter fallacy." The second one (raised by Andrew McKenzie of the University of Kansas) is more interesting to me, since it actually involves the statistical interpretation. This statistical fallacy involves "discovering hidden structure" or clusters in data where there is really no evidence for anything. It takes its name from the tale of a Texas gun for hire who was not a very good shot. Being clever, he took out his two revolvers and fired 12 shots as best he could at the side of a barn, and then painted a target centered on the tightest cluster of bullet holes. He then showed the target to potential clients, claiming to be a sharpshooter.
In the Eurasiatic data, I guess the problem could be that the 23 "ultraconserved" lexical items which were found to unite the families could just be randomly like each other, but it is hard for me to draw this analogy with the Texas sharpshooter because the statistical results in the paper are so significant they seem to mitigate problems of this kind. For one thing, there are 7 language families involved and not just two. For another, the 23 lexical items emerge from the typical 200-word Swadesh list comparison. Without any rigorous argument, it seems to me that there is a very low chance of 23 items out of 200 (that's 12.5%) randomly being similar across 7 language families. A commonly cited real instance of a scientific study waylaid by the Texas sharpshooter was a Swedish epidemiological study of 800 medical conditions. They found a significant difference in the incidence of one ailment out of 800, among people who lived near electric transmission lines (this is cited on the Wikipedia page about the Texas sharpshooter). This result is now regarded as not reproducible, an instance of the Texas sharpshooter. But let's take note that 1 ailment out of 800 is quite different from 23 words out of 188.
Quentin Atkinson assured me that he can stand behind this paper, and he may yet have to defend it in the pages of PNAS or some similar platform. These authors are not going to make a clean getaway with such a provocative proposal, not once the anti-mass comparison folks in comparative linguistics got wind of it. My own view in general is that we should embrace new sources of evidence in linguistics, rather than closing ranks and saying that methods developed over a century ago are really the only way. Let's not forget that the standard comparative method is so strict that it can be carried out "by a trained eye," and without any statistical processing. Surely there must be some kind of computational analysis that can go beyond this.
Thursday, March 14, 2013
Toward the learnability theory of language as a complex adaptive system
Summary
There have recently been a number of efforts to model
language as a complex adaptive system. A few successful projects have
explicitly modeled evolving language using evolutionary game theory. When
carefully applied, this technique has shown itself able to account for aspects
of language change, and there deserves to be far more testing of this approach. In the simplest kind of model, the speakers
play “language games” in which the objective is to imitate each other. The game
strategies are related by a payoff matrix, and the “imitation dynamics” governs
the evolution in a fashion analogous to the replicator dynamics in biological
models.
Learnability theory (of
language) is the established mathematical study of the learning capabilities of
inductive language learning algorithms.
Such theoretical analysis covers a wide range of learning models, and
can be very helpful in evaluating the effectiveness of postulated language
learning algorithms. But in reality, of
course, language learners are themselves the speakers in the evolving
linguistic community. Language learners
are busy learning an evolving language—in fact they are part of the cause. Mutation
in an evolutionary language game can be modeled as imperfect learning. The
program of research I am putting forward here is to forge a greater connection
between the evolutionary modeling of language and the formal learnability
theory of language. There is precious little existing research on this topic,
but as a practicing linguist I believe we must seek to understand language as
something which is at once inductively learned and evolving under a host of
systemic pressures and functional adaptations.
Only then can we hope to achieve significant understanding of the most
important human cognitive ability.
Language as a complex adaptive system
At present, the efforts to model human language as a “complex
adaptive system” have been developing for at least two decades. Beginning in
the 1990s with some pioneering early work on models of language evolution by
Steels, Croft and other researchers, the idea of modeling language as an
emergent property of a system of agents who are trying to communicate gradually
became more popular, though it is still far from being a mainstream topic in
Linguistics. The currency and modern development of the approach is evinced by
such books as The Origin of Vowel Systems1,
Self-Organization in the Evolution of
Speech2,
and Language as a Complex Adaptive System3.
De Boer1 describes how
language meets the criteria of a complex system: the interacting elements are
the speakers, and the local interactions are the speakers talking to each
other, and also learning language from the speech community. De Boer goes on to
explain how language is also adaptive: it changes under the influence of
cognitive and social forces which seek to optimize a variety of attributes,
viz. communicative efficiency, communicative effectiveness, and ease of
learning.
Steels4 introduced the
simulation of the complex system of language, consisting of a large number of
agents interacting and playing a “language game” designed to foster increased
communicative effectiveness. The
approach is clearly based upon Maynard Smith’s5 evolutionary game
theory. A number of the elements of
natural language have been modeled as emerging from a population playing games:
De Boer (op. cit.) showed how agents playing an “imitation game” with vowel
sounds could spontaneously develop a vowel system bearing similarities to
natural vowel systems; Steels6 simulated
emergence of conceptual categories and linguistic syntax, while Steels and
Kaplan7 simulated
emergence of a lexicon—which is to say, a set of form-meaning relations shared
among the population.
Yet, something is missing from much of this development: a theory
which can both describe and constrain how such complex systems of language learners might function while
nevertheless changing their established language. It seems that the notion of
complex adaptive system has been adopted in much functionally motivated
linguistic research almost as a leitmotif rather than as a serious mathematical
theory which can be used to study the interconnected processes of language
learning and evolution. There are
apparent conundrums in this interconnection; chief among them is that while
human children learn language successfully, language continues to evolve
through the generations.
Evolutionary game theory
Evolutionary game theory was developed by J. Maynard Smith,5 and is well known
as a model of evolving complex systems. In the study of language evolution, it has
been used to formalize the notion of functional adaptation of a language to
meet certain communicative needs.8 Jäger (op. cit.)
used a stochastic evolutionary game simulation to demonstrate that certain
common features of grammar are stable states, while other unattested conditions
are evolutionarily unstable, using just a few uncontroversial facts about
linguistic communication. Despite this
interesting successful research, there has been little or no follow-up. My feeling is that a lot more work deserves
to be pursued in this area. Jäger only
dealt with functional adaptation affecting a couple of grammatical features;
there are numerous other evolutionary forces which drive language change in
other varied ways.
By way of example, let me consider the evolution of vowel
systems. Some pioneering modeling was done by de Boer on the initial evolution of vowel systems (at
the dawn of language), but there has been little or no complex systems modeling
of the continuing evolution of the sounds of language in response to systemic
pressures. There is indeed scarce agreement about what the systemic pressures
affecting vowel systems actually are. For whatever reason, natural languages
most frequently evolve to a point where they have approximately 5 vowel
qualities (usually /i, e, a, o, u/ as in Spanish), and this seems to be an
evolutionarily stable state of our systems—languages with approximately 5
vowels are often evolutionary targets and keep them unchanged for many
centuries (to wit, Spanish). On the
other hand, many languages have for unknown reasons developed vowel systems
with more than 10 distinct vowel qualities—this is characteristic of the
Germanic languages including English. These larger vowel inventories, however,
are usually unstable; English dialects are constantly going through “vowel
shifts” which threaten to render the many global varieties of English mutually
unintelligible. A further fact of interest, however, is that the state in which
a language has a large number of constantly shifting vowels itself appears to
be evolutionarily stable. English has
had at least 10 vowels and diphthongs since Anglo-Saxon times (leaving
long/short distinctions aside) and now has about 13, so rather than reducing
the number of vowels we have cycled through many different qualities of these
vowels in the intervening centuries. This type of vowel shifting is reminiscent of
the stable oscillatory states which have been demonstrated in evolutionary
models of cooperative behavior which include mutation.9
In language evolution, successful imitation plays the part
of the replication found in evolution models (so Maynard Smith’s replicator
dynamics are now imitation dynamics),8 while imperfect
learning by the next generation is the mutation. The above mentioned vowel
shifting is likely to be caused in part by a failure to accurately imitate so
many vowels because of production/perception failures (this is a fitness failure in evolutionary terms),
but also by quasi-random changes in the lexicon that affect the functional load of the vowel contrasts.
Functional load (briefly: the amount of important distinctive work done by a
distinction in a given language)10 is likely to be an
important force in the evolution of language, although along with many other
such forces it has never been modeled in a full-fledged evolutionary game
simulation.
Many other evolutionary phenomena have been postulated for
natural language but have never been subjected to detailed study through
dynamical simulation. One further example is the apparent tendency for the
separate words of sentences to gradually “agglutinate,” so that sequences of
words often become prefix-root or root-suffix combinations (witness the
modern-day creation of English items like gonna
and coulda). The reverse process,
while easy to imagine, is all but unknown in reality.11 So we see that
there are subtle questions of linguistic fitness that need to be carefully
considered to achieve explanatory models.
Formal learning theory
Formal learning or “learnability” theory has recently been
reviewed by Fulop and Chater12, who cover a
number of distinct approaches to the mathematical modeling of learning
functions or languages. A standard model of learning can be used in nearly all
formal learning algorithms: we suppose that the learner receives a data
sequence
—this is the example
set or learning sample or training set—one item at a time. The
data sequence consists of examples. The learner then proposes a hypothesis to
characterize what it has learned after each example. One natural goal for the language learner is to recover (or
perhaps to approximate) the “true” language from which the data D has been generated. In this setting,
an example might consist of a sequence of symbols, plus the information that
this expression is within the target language. A learning sample would then be
a sequence of such sentences.
Learnability theory is traditionally concerned chiefly with
how to set up a problem so whether the “true” function, concept or language is
learnable can be assessed by mathematical analysis. This kind of learning
theory also usually focuses on learning as a process in which the learner’s
hypothesis approaches the target language as more data is analyzed. Key
differences among theoretical frameworks within learning theory center on the
specific way to model the notion of “approaching the target.” For the research
program I envision, it is not essential to select one particular learning
theoretic approach. One could imagine useful results being connected to a
variety of methodologies including Bayesian learning, formalized inductive
inference, and probably approximately correct inference, given that each of
these disparate approaches has yielded important results pertaining to language
learning.
An important finding to emerge from language learnability
studies is that various elements of natural languages can be successfully
learned by a variety of specific algorithms, but only if one allows either
unrealistic computing power13 or tighter
restrictions on the class of languages which can be learned14, 15 (i.e. some form of Universal
Grammar or universal learning bias). The form of innate learning bias that will
serve to permit language learning is not as extensive as that originally sought
within the Chomskyan program of Principles and Parameters, however. The latter
program called for such a rich innate component that
the credo “most of language is innate” has sometimes been attributed to the
Chomskyan paradigm.16
Learning in evolving systems
Seemingly the first source to combine learnability theory
with the study of language as an evolving dynamical system is Niyogi 200617. While impressive and mathematically
sophisticated, this work should be viewed as only a starting point for the
research program I envision. There are
many assumptions made in Niyogi’s approach that deserve reconsideration,
including the language learning paradigm and the dynamical system model. For the former, Niyogi stuck to the linguistic
paradigm of Principles and Parameters, a model that has since fallen out of
favor, in part due to Niyogi’s own proofs18 pointing out that
the learning algorithms did not have the expected nice properties. For
the latter, Niyogi’s dynamical models are not complex systems, rather they are deterministic simple systems with
tractable analytic solutions. While this
enabled the calculation of handy mathematical results instead of messy
simulations, now we must move into the complex systems regime—indeed they
should be not only complex but adaptive
systems as well.
In what is apparently the only literature to add
substantially to Niyogi’s approach, Chatterjee et al.9 provide some
useful methods for combining the study of evolutionary dynamical systems with
the study of learning theory. While
mentioning linguistic applications, their work is focused on populations
learning Prisoner’s Dilemma strategies. My plan is to in essence combine the methods of
Jäger and those of Chatterjee et al., in the search for novel substantive
connections between the complex adaptive system model of language change and
learning-theoretic results about language.
The main theoretical area which requires development is, as pointed out
to me by Nick Chater (p.c.), the scenario in which inductive learners (i.e.
children) are using the “results” of previous learners (i.e. adult speakers)
who presumably have identical language learning biases, and all are together in
the same evolutionary dynamical system.
This should make it possible to learn more easily from finite (and thus
partial) language data. But as
mentioned, there seems to be no published literature which addresses these
points or which does what I’m envisioning for my research program.
Research plans
My plans for carrying out the research involve a number of
interrelated activities and phases. I plan to construct evolutionary language
simulations which model a variety of linguistic forces beyond functional
adaptation, such as cognitive and speech mechanistic constraints. Such simulations will progress to involve
multiple generations of speakers, in which the younger speakers learn language
from the older speakers by imitation. The next phase would add mutation to the
simulation, in the form of imperfect learning.
My hope is eventually to be able to derive language learnability results
in the specific setting of a multigenerational adaptive speech community with
homogeneous learning biases.
To take a specific example, I plan to invoke an existing
method for learning about the morphology of words19 to develop an
evolutionary game simulation in which each succeeding generation of learners
applies the method to the output of the previous generation. Constructing the simulation will involve
carefully considered parameters and entries in the payoff matrix which
determines the outcomes of “games” played by the participants. The overall goal of the game is not only successful
imitation but correct word structure in relation to other similar words, which
can be gauged by a number of possible measures. This will use a stochastic
version of evolutionary game theory, as in Jäger (op. cit.). Once the basic evolving system is set up,
mutation can be introduced and the dynamics examined under different
assignments to fitness parameters.
The basic learning results about this approach to morphology
are quite straightforward19 and should be
applicable within the dynamic systems approach; I imagine that the complexity
of the learning is an interesting object of study in addition to the
learnability per se. The effects of mutation in various forms will surely
affect the learning results; it may become impossible to learn adequately if
imperfection is too great, because a degree of homogeneity needs to be found
among the adult speakers. In general I
am hoping to find some unforeseen results.
References
2. Oudeyer
P-Y. Self-Organization in the Evolution
of Speech. Oxford: Oxford University Press; 2006.
Wednesday, February 6, 2013
Why formal learning theory matters for cognitive science
A new special topic bearing this title, edited by myself and Nick Chater, has just been published in Topics in Cognitive Science. The topic includes numerous papers on formal learning theory of languages, and a couple of others addressing Bayesian and semisupervised learning.
I won't bother to link to this journal, since you either have subscription access or you don't, and a direct link is not likely to work either way.
I won't bother to link to this journal, since you either have subscription access or you don't, and a direct link is not likely to work either way.
Athabaskan languages
Sometimes I get the distinct feeling that mathematical linguistics, and linguistic theory in general, has absolutely nothing to say about some languages. I've been renewing my interest in Navajo lately, which is a typical representative of the Athabaskan (Na-Dene) family in general.
My big Analytical Lexicon of Navajo (Young, Morgan and Midgette 1992) organizes the verbs according to the roots, each of which is expressed by several stems in conjugated verbs. Verbs are conjugated for two different kinds of aspect in a kind of two-dimensional aspect matrix. Some verbs have 8 or more different aspect combinations that they may be conjugated in. The lexicon lists 550 roots, expressed using 2100 stems. And it's all irregular. All of this verbal morphology, for the entire language, is irregular. There are no rules which would yield the pronounced forms, as far as I can see.
Now, there is in fact some regular inflection on the verbs such as subject and object agreement, and some of the aspectual prefixes are sort of regular and are sorted into multiple verbal classes, but the stems expressing the various aspect combinations are irregular. A theory of aspectual meanings and the possibilities there would be greatly desired. My cross-linguistic surveys of aspectual systems tell me that the study of aspect is a total mess. There are different terminologies used every time you turn around and look at a new language family.
When I look at Navajo I'm reminded that a major gap in mathematical linguistics is a theory of morphosyntax. These Navajo verbs are sufficiently expressive that they can serve as a complete sentence, so long as you're happy to speak using pronouns. The pronouns themselves are the agreement morphemes on the verb. There are other (hundreds) of verbal prefixes that serve to add specific characters to the action, like "going on and on", "descending from a height", "shape of a circle", and so forth.
If I could develop a theory of anything that would work for Navajo, I'd know I'd accomplished something important.
My big Analytical Lexicon of Navajo (Young, Morgan and Midgette 1992) organizes the verbs according to the roots, each of which is expressed by several stems in conjugated verbs. Verbs are conjugated for two different kinds of aspect in a kind of two-dimensional aspect matrix. Some verbs have 8 or more different aspect combinations that they may be conjugated in. The lexicon lists 550 roots, expressed using 2100 stems. And it's all irregular. All of this verbal morphology, for the entire language, is irregular. There are no rules which would yield the pronounced forms, as far as I can see.
Now, there is in fact some regular inflection on the verbs such as subject and object agreement, and some of the aspectual prefixes are sort of regular and are sorted into multiple verbal classes, but the stems expressing the various aspect combinations are irregular. A theory of aspectual meanings and the possibilities there would be greatly desired. My cross-linguistic surveys of aspectual systems tell me that the study of aspect is a total mess. There are different terminologies used every time you turn around and look at a new language family.
When I look at Navajo I'm reminded that a major gap in mathematical linguistics is a theory of morphosyntax. These Navajo verbs are sufficiently expressive that they can serve as a complete sentence, so long as you're happy to speak using pronouns. The pronouns themselves are the agreement morphemes on the verb. There are other (hundreds) of verbal prefixes that serve to add specific characters to the action, like "going on and on", "descending from a height", "shape of a circle", and so forth.
If I could develop a theory of anything that would work for Navajo, I'd know I'd accomplished something important.
Subscribe to:
Posts (Atom)