Mathematical Linguistics etc.: October 2011

In my previous post, I suggested that syntax researchers put forth too many new theories without good reason. One good reason for a new theory would be to improve the way in which the aspects of language are modeled, and also to improve the interfaces between syntax and other realms. In this connection, some recent work by Mehrnoosh Sadrzadeh, Bob Coeke, Anne Preller, and other collaborators seems quite interesting.

This research program is promoting the "pregroup grammars" framework proposed by Jim Lambek over ten years ago, which has been gaining popularity in mathematical linguistics and appears to have overtaken the momentum of the type-logical framework. In some earlier posts here I suggested that I did not understand the motivation for pregroup grammars and saw no reason to pursue them. Considering syntax per se, I stand by that position. The research program of Sadrzadeh et al., however, is getting me to reconsider.

One of the purported advantages of the type-logical approach over, say, generative syntax, is the simple interface with semantics coded into a typed lambda calculus or its intensional variants such as Montague logic. That said, Montague logic is hardly the perfect system for natural language semantics. A major issue is that word meanings themselves are often just treated as "primitives." As I've explained to bemused linguists during colloquia on occasion, the meaning of a word is represented by the word set in boldface! I've jokingly referred to this as "the boldface theory of lexical semantics."

Now, Sadrzadeh and collaborators take up the mantle of vector space semantics, known mostly from information retrieval, for representing word meanings in a data-driven, usage-based fashion. I am sympathetic to this prospect; it is cognitively plausible, and it is certainly an improvement on the boldface theory.

The real interest of the research program, however, is neither pregroup grammars nor vector space semantics, but the key insight that the two are tightly connected through deep mathematical commonalities. In this way, a pregroup grammar can essentially be provided with a lexicon that has a vector space semantics in a coherently connected way. What is more, you even get a vector space semantics of sentences, which takes the old scheme a step further. The specific mathematical connection between these frameworks is provided by category theory. Pregroups are examples of compact closed categories, as is the category of vector spaces with linear maps and a tensor product (Coecke et al. 2010).

The diagrammatic "calculus" of such categories can be used to simplify meaning computations, and have also been applied to expose the "flow of information" within quantum information protocols (Coecke et al. 2010). Other work by Sadrzadeh has also highlighted the connections with quantum probability logic. This is very interesting stuff which most linguists are sorely unprepared to follow. Theoretical linguistics is currently at risk of falling into irrelevance, while scientists with training in other fields pick up the slack and do the really interesting research without us.

References:

Coecke, B., M. Sadrzadeh and S. Clark "Mathematical foundations for a compositional distributional model of meaning," arXiv.org (2010).

Preller, A. and M. Sadrzadeh "Semantic vector models and functional models for pregroup grammars," J. Logic, Lang. Inf. (2011) 20:419-443.

Mathematical Linguistics etc.

Thursday, October 20, 2011

Quantum semantics

Followers

Blog Archive

About Me