In mathematical approaches to linguistics, the results are usually solid because they are derived mathematically. But there always remains the question whether the mathematical model chosen does actually serve as a satisfactory model of something. Carnap pointed out a long time ago that this question does not have a formal answer, it is really quite subjective. He used the term explicatum to refer to the formal model of an informally expressed explicandum.

When it comes to the learning of grammar, even restricted to syntax of natural language, it is not even so clear what the explicandum is. The only obvious fact is that children do indeed learn a language. Suppose it has a grammar---then what? Researchers disagree on how to model the learning of grammar. There is a canon of literature in algorithmic learning theory that models grammatical inference using only text strings as input data. Alex Clark is a contributor to this literature, and he has derived interesting results concerning the identifiability of substitutable context-free languages. I noticed in the 2009 Algorithmic Learning Theory proceedings that this result has recently been extended to substitutable mildly context-sensitive languages, the latter being a ubiquitous class that turns up again and again in mathematical linguistics.

A substitutable language is, roughly, one where any two substrings sharing the same context (construed linearly, not structurally) also share all their contexts. I.e. the language allows them to substitute for one another. It seems to me that this is too strong a property to be true of natural languages.

Aside from that, it is quite unlikely that children learn language from word sequences only, divorced from meanings. For these reasons, I have pursued a different modeling approach in my work on type-logical grammar induction (see my new paper in the Journal of Logic, Language and Information for a full account). I hold that learning should be modeled using sentences annotated with structural information. I think it is possible that children could use a kind of corpus analysis, together with their meaningful experience, to glean this information without benefit of a complete grammar.

But you see, the matter of modeling is subjective. I had a (friendly) argument about these very points with Alex Clark a few years ago. He was equally committed to pursuing the pure string learning model. He said my assumptions were too rich, the extra information was too plentiful. We weren't arguing about mathematical points, but rather about what is the best explicatum for the right explicandum. We can't even agree about the explicandum.

Mathematical linguistics will benefit, I believe, from the parallel pursuit of different explicata for different explicanda. Then when we know more, due to advances in cognitive science or whatever, a wide range of different mathematical results will already be known.

## Sunday, May 16, 2010

Subscribe to:
Post Comments (Atom)

## No comments:

## Post a Comment