The title of this post alludes to a very deep and interesting paper from Marcus Kracht, which was published in Linguistics and Philosophy (2007). It is sort of embarrassing that it took me this long to notice the paper, but I am not in the loop sometimes, and this is better late than not at all I suppose.

In this paper, Kracht applies his considerable intellect to fundamental problems of compositional semantic representations and the syntax-semantics interface. I dearly love fundamental problems papers, and they are so rare and hard to get published, so this one is really a treat. Kracht begins by defining what a compositional semantics should involve, and more importantly, what it should not involve. It should not involve indices, in the sense that variables in logic can have indices to tell them apart. I think it is more than reasonable from a cognitive perspective, to say that if human semantic representations use "variables" in any way, it is highly unlikely that any sort of named or indexed variables are used. Without this commonplace crutch, Kracht has to design a new kind of semantic representational system. He proposes further that, if this can be done properly, then semantic representations provide motivation for evident syntactic constituents in sentences. Basically, the idea is that the semantics is too weak to get the right meaning without some assistance from syntactic constituency, I think.

In the rest of the paper, Kracht dismantles modern syntactic theory and Montague grammar, and puts the pieces back together in a new and interesting way. As a kind of large aside, he proves an interesting result about Dutch context-freeness (or lack thereof). I can't bring myself to try to summarize the technical details, so just take a look at Kracht's paper if you want to see a very useful alternative perspective on many things that are frequently taken for granted without much worry.

## Thursday, November 17, 2011

## Thursday, October 20, 2011

### Quantum semantics

In my previous post, I suggested that syntax researchers put forth too many new theories without good reason. One good reason for a new theory would be to improve the way in which the aspects of language are modeled, and also to improve the interfaces between syntax and other realms. In this connection, some recent work by Mehrnoosh Sadrzadeh, Bob Coeke, Anne Preller, and other collaborators seems quite interesting.

This research program is promoting the "pregroup grammars" framework proposed by Jim Lambek over ten years ago, which has been gaining popularity in mathematical linguistics and appears to have overtaken the momentum of the type-logical framework. In some earlier posts here I suggested that I did not understand the motivation for pregroup grammars and saw no reason to pursue them. Considering syntax per se, I stand by that position. The research program of Sadrzadeh et al., however, is getting me to reconsider.

One of the purported advantages of the type-logical approach over, say, generative syntax, is the simple interface with semantics coded into a typed lambda calculus or its intensional variants such as Montague logic. That said, Montague logic is hardly the perfect system for natural language semantics. A major issue is that word meanings themselves are often just treated as "primitives." As I've explained to bemused linguists during colloquia on occasion, the meaning of a word is represented by the word set in boldface! I've jokingly referred to this as "the boldface theory of lexical semantics."

Now, Sadrzadeh and collaborators take up the mantle of vector space semantics, known mostly from information retrieval, for representing word meanings in a data-driven, usage-based fashion. I am sympathetic to this prospect; it is cognitively plausible, and it is certainly an improvement on the boldface theory.

The real interest of the research program, however, is neither pregroup grammars nor vector space semantics, but the key insight that the two are tightly connected through deep mathematical commonalities. In this way, a pregroup grammar can essentially be provided with a lexicon that has a vector space semantics in a coherently connected way. What is more, you even get a vector space semantics of sentences, which takes the old scheme a step further. The specific mathematical connection between these frameworks is provided by category theory. Pregroups are examples of compact closed categories, as is the category of vector spaces with linear maps and a tensor product (Coecke et al. 2010).

The diagrammatic "calculus" of such categories can be used to simplify meaning computations, and have also been applied to expose the "flow of information" within quantum information protocols (Coecke et al. 2010). Other work by Sadrzadeh has also highlighted the connections with quantum probability logic. This is very interesting stuff which most linguists are sorely unprepared to follow. Theoretical linguistics is currently at risk of falling into irrelevance, while scientists with training in other fields pick up the slack and do the really interesting research without us.

References:

Coecke, B., M. Sadrzadeh and S. Clark "Mathematical foundations for a compositional distributional model of meaning," arXiv.org (2010).

Preller, A. and M. Sadrzadeh "Semantic vector models and functional models for pregroup grammars," J. Logic, Lang. Inf. (2011) 20:419-443.

This research program is promoting the "pregroup grammars" framework proposed by Jim Lambek over ten years ago, which has been gaining popularity in mathematical linguistics and appears to have overtaken the momentum of the type-logical framework. In some earlier posts here I suggested that I did not understand the motivation for pregroup grammars and saw no reason to pursue them. Considering syntax per se, I stand by that position. The research program of Sadrzadeh et al., however, is getting me to reconsider.

One of the purported advantages of the type-logical approach over, say, generative syntax, is the simple interface with semantics coded into a typed lambda calculus or its intensional variants such as Montague logic. That said, Montague logic is hardly the perfect system for natural language semantics. A major issue is that word meanings themselves are often just treated as "primitives." As I've explained to bemused linguists during colloquia on occasion, the meaning of a word is represented by the word set in boldface! I've jokingly referred to this as "the boldface theory of lexical semantics."

Now, Sadrzadeh and collaborators take up the mantle of vector space semantics, known mostly from information retrieval, for representing word meanings in a data-driven, usage-based fashion. I am sympathetic to this prospect; it is cognitively plausible, and it is certainly an improvement on the boldface theory.

The real interest of the research program, however, is neither pregroup grammars nor vector space semantics, but the key insight that the two are tightly connected through deep mathematical commonalities. In this way, a pregroup grammar can essentially be provided with a lexicon that has a vector space semantics in a coherently connected way. What is more, you even get a vector space semantics of sentences, which takes the old scheme a step further. The specific mathematical connection between these frameworks is provided by category theory. Pregroups are examples of compact closed categories, as is the category of vector spaces with linear maps and a tensor product (Coecke et al. 2010).

The diagrammatic "calculus" of such categories can be used to simplify meaning computations, and have also been applied to expose the "flow of information" within quantum information protocols (Coecke et al. 2010). Other work by Sadrzadeh has also highlighted the connections with quantum probability logic. This is very interesting stuff which most linguists are sorely unprepared to follow. Theoretical linguistics is currently at risk of falling into irrelevance, while scientists with training in other fields pick up the slack and do the really interesting research without us.

References:

Coecke, B., M. Sadrzadeh and S. Clark "Mathematical foundations for a compositional distributional model of meaning," arXiv.org (2010).

Preller, A. and M. Sadrzadeh "Semantic vector models and functional models for pregroup grammars," J. Logic, Lang. Inf. (2011) 20:419-443.

## Wednesday, September 28, 2011

### The hunt for new syntactic theories

It seems like syntacticians, of both mathematical and generative stripes, are constantly hunting for a new and improved syntactic theory. But I'm not really sure what we are supposed to be looking for. Surely now, it is established that any theory capable of generating languages of sufficient complexity is capable of "capturing the data" or describing it or whatever. Yet syntacticians still publish papers where they demonstrate that some new notion or theory is capable of deriving some fancy piece of data that somehow eludes the others. Is this really what they are doing? Because this seems like a moot argument.

It seems to me that, if there is to be any rationale for improving syntactic theory, it should be something like cognitive plausibility or computational effectiveness, or perhaps even theoretical elegance. I don't know what people are driving at, because the desiderata of a better syntactic theory are almost never discussed. Should we all know what we are seeking? Because I'm not so sure anymore, and thus I am not convinced we should continue to hunt around. For the moment I'm satisfied that type-logical grammar is capable of deriving everything that needs to be derived. Am I wrong?

It seems to me that, if there is to be any rationale for improving syntactic theory, it should be something like cognitive plausibility or computational effectiveness, or perhaps even theoretical elegance. I don't know what people are driving at, because the desiderata of a better syntactic theory are almost never discussed. Should we all know what we are seeking? Because I'm not so sure anymore, and thus I am not convinced we should continue to hunt around. For the moment I'm satisfied that type-logical grammar is capable of deriving everything that needs to be derived. Am I wrong?

## Wednesday, August 31, 2011

### Analog computation

This post is a quick precis of something that I hope works out a little longer. There has been, over the years, some amount of research on the complexity theory of analog computation (i.e., using analog devices with no quantization or digitization, which operate in continuous time over the real numbers). There is not, as yet, a good and complete theory of this, but some key recent papers that a person could start with are found in the book New Computational Paradigms (Springer 2008).

One question that continues to be debated is whether analog algorithms have the same constraints or lower bounds on complexity as digital algorithms computing the same results. In the most extreme view, it has been suggested that analog methods could be used to solve NP-hard problems quickly, perhaps in polynomial time. Some papers have analyzed this question, and the results so far appear to be negative. For example, a paper posted online by Warren Smith of NEC Corp. (1998) showed that in order for the above suggestion to hold of a particular kind of physical computer (a "plane mechanism"), some other implausible things would have to be the case as well.

OK, so maybe we cannot go all the way from NP-hard to P. But this doesn't in any sense prove that the lower complexity bounds are exactly equal. There are plenty of digital computations which, while not NP-hard, are still crappy in practice because they involve some large power of the input. What good is polynomial time when the algorithm's average case complexity is O(n^82) for instance?

Why is all this important? Well, cognitive science entertains a wide array of computational models, which are usually being proposed as "cognitively plausible" in some vague sense. So now there are debates about how tractable should a simulation need to be in order to seem cognitively plausible. But, all the simulations are digital. Meanwhile, the brain is not digital, it is some sort of analog system. I think that debaters need to take care over this mismatch, because there is no clear correspondence between the complexity of a digital computation and the complexity of an analog computation accomplishing the same task.

One question that continues to be debated is whether analog algorithms have the same constraints or lower bounds on complexity as digital algorithms computing the same results. In the most extreme view, it has been suggested that analog methods could be used to solve NP-hard problems quickly, perhaps in polynomial time. Some papers have analyzed this question, and the results so far appear to be negative. For example, a paper posted online by Warren Smith of NEC Corp. (1998) showed that in order for the above suggestion to hold of a particular kind of physical computer (a "plane mechanism"), some other implausible things would have to be the case as well.

OK, so maybe we cannot go all the way from NP-hard to P. But this doesn't in any sense prove that the lower complexity bounds are exactly equal. There are plenty of digital computations which, while not NP-hard, are still crappy in practice because they involve some large power of the input. What good is polynomial time when the algorithm's average case complexity is O(n^82) for instance?

Why is all this important? Well, cognitive science entertains a wide array of computational models, which are usually being proposed as "cognitively plausible" in some vague sense. So now there are debates about how tractable should a simulation need to be in order to seem cognitively plausible. But, all the simulations are digital. Meanwhile, the brain is not digital, it is some sort of analog system. I think that debaters need to take care over this mismatch, because there is no clear correspondence between the complexity of a digital computation and the complexity of an analog computation accomplishing the same task.

## Wednesday, July 27, 2011

### Neuroelectrodynamics

One of the million or so things that interest me is the question of how the brain actually accomplishes anything. This is relevant to linguistics because, well, that's sort of obvious. A prevalent model of neural computation tells us that the brain computes by passing information among various neurons, and that these neurons encode messages in sequences of action potential 'spikes' by a mechanism known as "spike timing."

A new book by Aur and Jog challenges this whole paradigm. It carries the strangely ungrammatical title Neuroelectrodynamics: Understanding the Brain Language and the grammar between the covers is no improvement, but it is very provocative and enticing to those of us who think that the current state of understanding in neuroscience is extremely poor.

The authors first demonstrate that neuron 'spikes' are not uniform or stereotypical, which itself goes against a main tenet of the spike timing model. They then outline a new scheme by which a new quantity of spike 'directivity' is encoded into the charges in movement during an action potential. Empirically, the details of dendritic arbors and axonal branches significantly modulate the extracellular action potential. Axons themselves cannot be approximated by linear cable models.

In the authors' charge movement model, different charges or groups under an electric field have distinct movements. To apply independent component analysis (ICA), the action potential is assumed to be the result of several independent sources, generated by charges that move. For recorded action potentials, blind source separation (a known signal processing technique) can be performed using ICA. The charge localization is obtained from triangulation and a point charge approximation. Singular value decomposition is performed for the matrix of charge coordinates. A 'spike directivity' can be described by a preferred direction of propagation of the electric signal during each action potential, and approximated with a vector. This is shown to be a much more effective means of encoding information and computation than spike timing. Details of the spike directivity are said to be published in other papers by the authors, which did appear in major neuroscience journals.

It is interesting to see how the current paradigm in neural computation could well be founded on nothing. The authors' ideas may not be proven, but they certainly have some interesting empirical findings which defy explanation otherwise, and it seems clear to me that the brain can't be doing all its work with spike timing.

A new book by Aur and Jog challenges this whole paradigm. It carries the strangely ungrammatical title Neuroelectrodynamics: Understanding the Brain Language and the grammar between the covers is no improvement, but it is very provocative and enticing to those of us who think that the current state of understanding in neuroscience is extremely poor.

The authors first demonstrate that neuron 'spikes' are not uniform or stereotypical, which itself goes against a main tenet of the spike timing model. They then outline a new scheme by which a new quantity of spike 'directivity' is encoded into the charges in movement during an action potential. Empirically, the details of dendritic arbors and axonal branches significantly modulate the extracellular action potential. Axons themselves cannot be approximated by linear cable models.

In the authors' charge movement model, different charges or groups under an electric field have distinct movements. To apply independent component analysis (ICA), the action potential is assumed to be the result of several independent sources, generated by charges that move. For recorded action potentials, blind source separation (a known signal processing technique) can be performed using ICA. The charge localization is obtained from triangulation and a point charge approximation. Singular value decomposition is performed for the matrix of charge coordinates. A 'spike directivity' can be described by a preferred direction of propagation of the electric signal during each action potential, and approximated with a vector. This is shown to be a much more effective means of encoding information and computation than spike timing. Details of the spike directivity are said to be published in other papers by the authors, which did appear in major neuroscience journals.

It is interesting to see how the current paradigm in neural computation could well be founded on nothing. The authors' ideas may not be proven, but they certainly have some interesting empirical findings which defy explanation otherwise, and it seems clear to me that the brain can't be doing all its work with spike timing.

## Wednesday, July 20, 2011

### Road to Reality

This summer I've been reading one of those few books that can change your life, The Road to Reality by Roger Penrose. This is basically all of modern theoretical physics, including the mathematical fundamentals starting from first principles. I have finally gotten through all the initial mathematical chapters, which include a quick rundown of Riemannian geometry and differential forms on manifolds. Why is it that mathematical physicists make the best math teachers? I also changed my life a few years back by reading Lectures on Mathematical Physics by Robert Geroch.

I suspect that this approach to geometry could be useful for a new approach to semantics and cognitive science (see previous post on the musings of Fenstad).

Geometry is so crucial to our understanding of reality (i.e. physics), it should not be surprising if it turns out to be crucial to our understanding of language and cognition as well.

I suspect that this approach to geometry could be useful for a new approach to semantics and cognitive science (see previous post on the musings of Fenstad).

Geometry is so crucial to our understanding of reality (i.e. physics), it should not be surprising if it turns out to be crucial to our understanding of language and cognition as well.

## Tuesday, June 28, 2011

### Grammar, Geometry, and Brain

J. E. Fenstad has a provocative little book called Grammar, Geometry, and Brain (CSLI 2010). It is speculative and is sort of like an annotated bibliography, but as such it is very valuable as a discussion of work in cognitive science and brain modeling at the interface with linguistics. Fenstad is a mathematical logician and occasional linguist. The book is filled with his own preferences and experience, but it is very forward thinking, perhaps venturing into the crackpot, but that's OK with me. It is inspiring.

Fenstad states "you can never be more than the mathematics you know," something I heartily agree with. Like him I will continue to seek out new mathematics to apply in linguistics. I'm tired of formal language theory and logic, these tools cannot be everything for linguistic theory. Fenstad's suggestion is that Geometry can provide a direct link between linguistic theory and cognitive modeling. He points to the Conceptual Spaces framework of GĂ¤rdenfors (MIT Press 2000) for a geometrical theory of semantics which he suggests could be connected to syntax using attribute-value grammar. He also discusses several works in cognitive neuroscience, suggesting how geometry could play a role in a bridging model connecting neuronal assemblies to conceptual spaces. It's an interesting book that pointed me to a lot of other readings.

Fenstad states "you can never be more than the mathematics you know," something I heartily agree with. Like him I will continue to seek out new mathematics to apply in linguistics. I'm tired of formal language theory and logic, these tools cannot be everything for linguistic theory. Fenstad's suggestion is that Geometry can provide a direct link between linguistic theory and cognitive modeling. He points to the Conceptual Spaces framework of GĂ¤rdenfors (MIT Press 2000) for a geometrical theory of semantics which he suggests could be connected to syntax using attribute-value grammar. He also discusses several works in cognitive neuroscience, suggesting how geometry could play a role in a bridging model connecting neuronal assemblies to conceptual spaces. It's an interesting book that pointed me to a lot of other readings.

## Wednesday, June 1, 2011

### What makes languages learnable?

A recent paper by Ed Stabler (in Language Universals, Christiansen et al. eds. 2009) puts the focus on an important question that is rarely formulated in the literature. What are the structural properties of natural languages which guarantee learnability? We know from a variety of negative results going back to the famous Gold theorem that such properties have to go far beyond the defining principles of the Chomsky hierarchy, since none of the traditional language classes except the finite languages are strictly learnable, in the sense of identifiability in the limit. Because we presume to model natural languages as some sort of infinite languages, something else must be going on. There must be some restrictions on the possible forms of natural language that permit learnability in some sense. Stabler says that to address this question, we need a proposal about how human learners generalize from finite data. There is as yet no complete answer to this problem, and indeed very little research seems to be currently motivated by such questions.

While I do not generally use this forum to highlight my own published work, in this case I believe that my 2010 paper in the Journal of Logic, Language and Information (together with an erratum published this year) does address Stabler's question directly. The title of my paper is Grammar Induction by Unification of Type-Logical Lexicons, and therein the basic proposal is given. Human learners are proposed to generalize from finite data by unification of the sets of syntactic categories that are discovered by an initial semantic bootstrapping procedure. The bootstrapping procedure extracts basic category information from semantically annotated sentence structures (this is highly enriched data, to be sure, but I argue for the plausibility of that in general terms). The basic system of categories is then unified by a two-step process that takes into account the distribution of the words (usage patterns, expressed structurally), and is able to generalize to "recursively redundant extensions" of the learning data. This is then a specific proposal of the sort invited by Stabler. The resulting learnable class of languages is highly restricted, including only those that are in some sense closed under recursively redundant extensions. This is in accord with general thinking about human languages, in that it is normally the case that a recognized recursive procedure (such as the appending of prepositional phrases to expand noun phrases, in rough terms) can always be applied indefinitely to yield grammatical sentences of increasing length.

I hope in the future to highlight my findings in a more general journal like Cognitive Science. It is helpful that Stabler conveniently provided the leading question to my purported answer.

While I do not generally use this forum to highlight my own published work, in this case I believe that my 2010 paper in the Journal of Logic, Language and Information (together with an erratum published this year) does address Stabler's question directly. The title of my paper is Grammar Induction by Unification of Type-Logical Lexicons, and therein the basic proposal is given. Human learners are proposed to generalize from finite data by unification of the sets of syntactic categories that are discovered by an initial semantic bootstrapping procedure. The bootstrapping procedure extracts basic category information from semantically annotated sentence structures (this is highly enriched data, to be sure, but I argue for the plausibility of that in general terms). The basic system of categories is then unified by a two-step process that takes into account the distribution of the words (usage patterns, expressed structurally), and is able to generalize to "recursively redundant extensions" of the learning data. This is then a specific proposal of the sort invited by Stabler. The resulting learnable class of languages is highly restricted, including only those that are in some sense closed under recursively redundant extensions. This is in accord with general thinking about human languages, in that it is normally the case that a recognized recursive procedure (such as the appending of prepositional phrases to expand noun phrases, in rough terms) can always be applied indefinitely to yield grammatical sentences of increasing length.

I hope in the future to highlight my findings in a more general journal like Cognitive Science. It is helpful that Stabler conveniently provided the leading question to my purported answer.

## Thursday, May 19, 2011

### Bare Grammar

I was recently reminded (thanks to the comment from Emilio on my previous post) of some great work that was done by my former professors Ed Keenan and Ed Stabler, that I first learned about when I was a PhD student in the 1990s. The key publication for reference is their book Bare Grammar, published by CSLI in 2003. It is actually rather embarrassing that I had largely forgotten about this work while preoccupied with my own efforts in learning theory the past several years, since I should have used some of their ideas to further my own.

Bare Grammar is a largely atheoretical framework for describing syntactic structure. I won't present any details here while trying to highlight the major points. Keenan and Stabler begin by pointing out that human languages always have lexicons in which most elements are grouped into classes (like parts of speech) whose members are in some sense intersubstitutable in the "same" positions within the "same" structures. A simple example is afforded by the English sentences:

Trevor laughed.

Nigel cried.

It is observed that Trevor can substitute for Nigel (and vice versa) without "changing structure," yielding equally grammatical sentences. The same can be said of the verbs laughed and cried. This means that the two sentences "have the same structure," in that they are each obtainable from the other by a sequence of "structure-preserving transformations." Hmm, this is starting to remind me of my previous post where I ruminated about gauge theory, as was suggested by the comment there.

Being mathematical linguists, Keenan and Stabler formalize all this. A structure-preserving map is defined as an automorphism of the grammar. An automorphism h can have fixed points x, such that h(x) = x. The syntactic invariants of the grammar are fixed points of every automorphism. In the lexicon, these are identified with the function words---words that cannot be substituted without changing structure. By using some fancier mathematical operations (power lifting and product lifting), it is shown how certain properties of higher order can also be fixed points, which correspond to such things like invariant properties of expressions, and invariant relations and functions. Much of the book is engaged in evaluating potential invariants of these kinds.

A major point is that natural languages all have such invariants---an interesting fact, to be sure, which is not at all necessary from the basic considerations of formal language theory. It would be easy to devise a formal language that didn't have anything like a function word, yet natural languages all have such things.

I hope to spend some more quality time with this book in the near future, and may have some more posts about it. In the meantime, I recommend it to anyone interested in a novel theoretical approach to a range of interesting linguistic facts that are not seriously dealt with elsewhere.

Bare Grammar is a largely atheoretical framework for describing syntactic structure. I won't present any details here while trying to highlight the major points. Keenan and Stabler begin by pointing out that human languages always have lexicons in which most elements are grouped into classes (like parts of speech) whose members are in some sense intersubstitutable in the "same" positions within the "same" structures. A simple example is afforded by the English sentences:

Trevor laughed.

Nigel cried.

It is observed that Trevor can substitute for Nigel (and vice versa) without "changing structure," yielding equally grammatical sentences. The same can be said of the verbs laughed and cried. This means that the two sentences "have the same structure," in that they are each obtainable from the other by a sequence of "structure-preserving transformations." Hmm, this is starting to remind me of my previous post where I ruminated about gauge theory, as was suggested by the comment there.

Being mathematical linguists, Keenan and Stabler formalize all this. A structure-preserving map is defined as an automorphism of the grammar. An automorphism h can have fixed points x, such that h(x) = x. The syntactic invariants of the grammar are fixed points of every automorphism. In the lexicon, these are identified with the function words---words that cannot be substituted without changing structure. By using some fancier mathematical operations (power lifting and product lifting), it is shown how certain properties of higher order can also be fixed points, which correspond to such things like invariant properties of expressions, and invariant relations and functions. Much of the book is engaged in evaluating potential invariants of these kinds.

A major point is that natural languages all have such invariants---an interesting fact, to be sure, which is not at all necessary from the basic considerations of formal language theory. It would be easy to devise a formal language that didn't have anything like a function word, yet natural languages all have such things.

I hope to spend some more quality time with this book in the near future, and may have some more posts about it. In the meantime, I recommend it to anyone interested in a novel theoretical approach to a range of interesting linguistic facts that are not seriously dealt with elsewhere.

## Thursday, April 28, 2011

### A gauge theory of linguistics?

Here is a bit of idle speculation. In my reading on theoretical physics, I have learned something about gauge theories. A concise description is found, naturally, on the Wikipedia page, where it is explained that the root problem addressed by gauge theory is the excess degrees of freedom normally present in specific mathematical models of physical situations. For instance, in Newtonian dynamics, if two configurations are related by a Galilean transformation (change of reference frame), they represent the same physical situation. The transformations form a symmetry group, so then a physical situation is represented by a class of mathematical configurations which are all related by the symmetry group. Usually the symmetry group is some kind of Lie group, but it need not be a commutative (abelian) group. A gauge theory is then a mathematical model that has symmetries (there may be more than one) of this kind. Examples include the Standard Model of elementary particles.

So far so great, but what does this have to do with linguistics? Well, it seems to me that mathematical models of language are often encumbered by irrelevant detail or an overly rigid dependence on conditions that are in reality never fixed. A simple example would be that a typical generative grammar of a language (in any theory) depends critically on the vocabulary and the categories assigned to it. In reality, different speakers have different vocabularies, and even different usages assigned to vocabulary items, although they may all feel they are speaking a common language. There is a sense in which we could really use a mathematical model of a language that is flexible enough to allow for some "insignificant" differences in the specific configuration assigned to the language. There may even be lurking a useful notion of "Galilean transformation" of a language.

This idea is stated loosely by Edward Sapir in his classic Language. He applies it to phonology, where he explains his conviction that two dialects may be related by a "vowel shift" in which the specific uses or identities of vowels are changed, but (in a sense that is left vague) the "phonological system" of the vowels in the two dialects is not fundamentally different. This idea may help to explain how American English speakers from different parts of the country can understand one another with relative ease even though they may use different sets of specific vowel sounds.

This is all a very general idea, of course. Gauge theory as applied in physics is really quite intricate, and I do not know yet if the specifics of the formalism can be "ported" to the particular problems of linguistic variation in describing a "common system" for a language. But what better place than a blog to write down some half-baked ideas?

So far so great, but what does this have to do with linguistics? Well, it seems to me that mathematical models of language are often encumbered by irrelevant detail or an overly rigid dependence on conditions that are in reality never fixed. A simple example would be that a typical generative grammar of a language (in any theory) depends critically on the vocabulary and the categories assigned to it. In reality, different speakers have different vocabularies, and even different usages assigned to vocabulary items, although they may all feel they are speaking a common language. There is a sense in which we could really use a mathematical model of a language that is flexible enough to allow for some "insignificant" differences in the specific configuration assigned to the language. There may even be lurking a useful notion of "Galilean transformation" of a language.

This idea is stated loosely by Edward Sapir in his classic Language. He applies it to phonology, where he explains his conviction that two dialects may be related by a "vowel shift" in which the specific uses or identities of vowels are changed, but (in a sense that is left vague) the "phonological system" of the vowels in the two dialects is not fundamentally different. This idea may help to explain how American English speakers from different parts of the country can understand one another with relative ease even though they may use different sets of specific vowel sounds.

This is all a very general idea, of course. Gauge theory as applied in physics is really quite intricate, and I do not know yet if the specifics of the formalism can be "ported" to the particular problems of linguistic variation in describing a "common system" for a language. But what better place than a blog to write down some half-baked ideas?

## Thursday, April 14, 2011

### Connectionism and emergence

I have not read too much mathematical linguistics lately, but I have been reading a lot of cognitive science and neuroscience, as well as connectionist research. Let me start off with connectionism. This is the approach involving artificial neural networks to employ "distributed processing" for computational purposes. I think that, in principle, such an approach to modeling language as a cognitive phenomenon will ultimately be the right approach. But there is a very large problem with current neural net modeling, chiefly that the neurons are too simple and the networks too small.

Neuroscience studies real neurons and their networks, although at present there are huge gaps in our understanding. While we are able to record signals from single neurons or very small groups, and we can also do "brain imaging" to track activity in huge (order of 10^9) numbers of neurons, we have no way to study activity in a few thousand neurons. It is precisely this "mesoscopic" regime where the phenomena of thought, memory, and knowledge are likely to be emergent from the nonlinear dynamical system known as the brain.

This brings me to the subject of "emergent phenomena," which refers to things that happen in a nonlinear dynamical system as a result of huge numbers of interactions among nonlinear dependencies. An emergent phenomenon on the ocean is a "rogue wave." An emergent phenomenon cannot be directly simulated through deterministic calculation, because it happens at a scale where there is not enough computing power in the world to run the simulation, there are too many interdependent variables.

Meanwhile, connectionism involves running simulations of neural networks that can be deterministically calculated. There are no emergent phenomena (so far as I know) in standard connectionist networks. So, this means they are not even able to manifest the most important thing happening in the brain in principle. So there is not any question that artificial neural networks do not model anything about the brain in the slightest sense.

Meanwhile in linguistics, a 'hot' idea is that classical linguistic categories like phonemes and parts of speech are "emergent" in a similar sense to an emergent phenomenon. The "emergentist" view of language holds that a phoneme emerges as an element of knowledge only after broad experience with "exemplars" in real speech. I am not exactly clear on the sense in which emergentist linguists think that such categories are emergent; do they mean statistically somehow, or do they mean "emergent" in the nonlinear chaos theory sense?

Conventional mathematical linguistics is looking quite far behind these newer developments and directions, but there is no question that better mathematical analysis would really help everyone to understand the new ideas like emergent linguistics.

Neuroscience studies real neurons and their networks, although at present there are huge gaps in our understanding. While we are able to record signals from single neurons or very small groups, and we can also do "brain imaging" to track activity in huge (order of 10^9) numbers of neurons, we have no way to study activity in a few thousand neurons. It is precisely this "mesoscopic" regime where the phenomena of thought, memory, and knowledge are likely to be emergent from the nonlinear dynamical system known as the brain.

This brings me to the subject of "emergent phenomena," which refers to things that happen in a nonlinear dynamical system as a result of huge numbers of interactions among nonlinear dependencies. An emergent phenomenon on the ocean is a "rogue wave." An emergent phenomenon cannot be directly simulated through deterministic calculation, because it happens at a scale where there is not enough computing power in the world to run the simulation, there are too many interdependent variables.

Meanwhile, connectionism involves running simulations of neural networks that can be deterministically calculated. There are no emergent phenomena (so far as I know) in standard connectionist networks. So, this means they are not even able to manifest the most important thing happening in the brain in principle. So there is not any question that artificial neural networks do not model anything about the brain in the slightest sense.

Meanwhile in linguistics, a 'hot' idea is that classical linguistic categories like phonemes and parts of speech are "emergent" in a similar sense to an emergent phenomenon. The "emergentist" view of language holds that a phoneme emerges as an element of knowledge only after broad experience with "exemplars" in real speech. I am not exactly clear on the sense in which emergentist linguists think that such categories are emergent; do they mean statistically somehow, or do they mean "emergent" in the nonlinear chaos theory sense?

Conventional mathematical linguistics is looking quite far behind these newer developments and directions, but there is no question that better mathematical analysis would really help everyone to understand the new ideas like emergent linguistics.

## Monday, March 21, 2011

### Formal cognitive reasoning

As a book selector for my university library, I find out about numerous books that would remain unknown to me otherwise. One interesting book which we recently bought is Cognitive Reasoning: A Formal Approach by Anshakov and Gergely. This book presents a massive effort to systematize a formal logical framework which is seriously intended to model cognitive reasoning. It is, in one sense, an amplification of the typical efforts to devise a "logic of knowledge and belief" within AI, but this description does not really do any justice to the efforts documented.

The book relies on "a long prehistory" of work carried out in Russia and Hungary over many years, founded on the plausible reasoning methods of so-called JSM systems (named after John Stuart Mill because of the close affinity with his logical proposals). The formal logics underlying the technique are versions of "pure J-logics." These are multi-valued logics with two sorts of truth values called "internal" and "external." The external truth values are just the usual two, but the internal truth values can be a very rich (countably infinite) set. The logics contain J-operators, which are characteristic functions of subsets of the internal truth values. All this machinery is used to devise a sophisticated logic of knowledge updating which "permits one to describe the history of cognitive processes." The objective is to model the reasoning process which moves a cognitive agent "from ignorance to knowledge," and so the system is a dynamic logic as well. The formalism also involves a new kind of syntactic entity in the logic, a "modification inference" which can add new formulae by nondeductive rules and thereby modify constituents which are already established in the inference.

I have not finished reading this book, but I believe that it is very important for anyone interested in logic-based AI or cognitive science. It provides a rich and technically interesting set of formal methods for seriously trying to model cognitive reasoning.

The book relies on "a long prehistory" of work carried out in Russia and Hungary over many years, founded on the plausible reasoning methods of so-called JSM systems (named after John Stuart Mill because of the close affinity with his logical proposals). The formal logics underlying the technique are versions of "pure J-logics." These are multi-valued logics with two sorts of truth values called "internal" and "external." The external truth values are just the usual two, but the internal truth values can be a very rich (countably infinite) set. The logics contain J-operators, which are characteristic functions of subsets of the internal truth values. All this machinery is used to devise a sophisticated logic of knowledge updating which "permits one to describe the history of cognitive processes." The objective is to model the reasoning process which moves a cognitive agent "from ignorance to knowledge," and so the system is a dynamic logic as well. The formalism also involves a new kind of syntactic entity in the logic, a "modification inference" which can add new formulae by nondeductive rules and thereby modify constituents which are already established in the inference.

I have not finished reading this book, but I believe that it is very important for anyone interested in logic-based AI or cognitive science. It provides a rich and technically interesting set of formal methods for seriously trying to model cognitive reasoning.

## Tuesday, March 1, 2011

### Robot language acquisition

I discovered an interesting research program going on currently in the Adaptive Systems Group at the University of Hertfordshire, UK. A representative paper is "An integrated three-stage model towards grammar acquisition" by Yo Sato and colleagues, that appeared in the 2010 IEEE International Conference on Development and Learning. The paper documents an experiment in "cognitive robotics" where a robot is situated in a realistic language-learning environment.

According to the abstract, "the first, phonological stage consists in learning sound patterns that are likely to correspond to words. The second stage concerns word-denotation association. . . The data thus gathered allows us to invoke semantic bootstrapping in the third, grammar induction stage, where sets of words are mapped with simple logical types." This work is especially interesting to me because the grammar induction uses a semantic bootstrapping algorithm related to one which I developed, and published in 2005 (Journal of Logic, Language and Information).

In a discussion following my previous post, I offered the opinion that as computing power increases, we will (I hope) see more efforts to implement theoretically inspired learning algorithms that are quite intractable. This robotics paper represents one such effort, which I am very pleased to see. Yo Sato tells me that they are now looking at incorporating the improvements I have recently made to the original semantic bootstrapping algorithms. It's always gratifying to see an application inspired by my theoretical developments, since this is really why I pursue the work, but I am not sufficiently capable or interested to carry out the applied work that is then called for.

According to the abstract, "the first, phonological stage consists in learning sound patterns that are likely to correspond to words. The second stage concerns word-denotation association. . . The data thus gathered allows us to invoke semantic bootstrapping in the third, grammar induction stage, where sets of words are mapped with simple logical types." This work is especially interesting to me because the grammar induction uses a semantic bootstrapping algorithm related to one which I developed, and published in 2005 (Journal of Logic, Language and Information).

In a discussion following my previous post, I offered the opinion that as computing power increases, we will (I hope) see more efforts to implement theoretically inspired learning algorithms that are quite intractable. This robotics paper represents one such effort, which I am very pleased to see. Yo Sato tells me that they are now looking at incorporating the improvements I have recently made to the original semantic bootstrapping algorithms. It's always gratifying to see an application inspired by my theoretical developments, since this is really why I pursue the work, but I am not sufficiently capable or interested to carry out the applied work that is then called for.

## Monday, February 14, 2011

### Niyogi's analysis of Principles and Parameters learning

In this post I will summarize Chapter 4 of Niyogi (1998) The Informational Complexity of Learning. In it, Niyogi emphasizes the importance of going beyond theoretical learnability when analyzing a grammatical paradigm. "One also needs to quantify the sample complexity of the learning problem, i.e., how many examples does the learning algorithm need to see in order to be able to identify the target grammar with high confidence."

He sets his sights upon the Triggering Learning Algorithm, put forth by Gibson and Wexler as a learning scheme for the grammatical "parameters" within the Chomskyan Principles and Parameters framework. For those unfamiliar with the background, this is a theory of language that posits a Universal Grammar underlying all natural languages (the principles), and then a finite set of variable parameters which account for the differences among languages. The "parameter setting" is really the sole learning task for the developing child on this account.

I think that in the beginning, this theory was put forth in an effort to address the supposed "poverty of the stimulus," with the hope that the resulting learning problem would be tractable, even easy. Niyogi, however, manages to demonstrate that Gibson and Wexler's assumption of the existence of "local triggers," i.e. a path through the parameter-setting space from the initial hypothesis to the target, is not even sufficient to guarantee learnability at all (though it was believed sufficient by Gibson and Wexler), much less tractability. He further demonstrated the surprising theorem that, for all its carefully thought out design, the Triggering Learning Algorithm is less optimal than a random walk on the parameter space!

At the time of Niyogi's writing, he judged that the Triggering Learning Algorithm was a preferred explanation of language learning in psycholinguistics. His results should really have killed it, but as far as I can see they have had no such effect. In fact, Google Scholar finds only 12 literature citations of his entire book, most of which are due to the author himself. This is hardly a flurry of activity; only one other author writing on problems of natural language learning appears to be among the citations.

He sets his sights upon the Triggering Learning Algorithm, put forth by Gibson and Wexler as a learning scheme for the grammatical "parameters" within the Chomskyan Principles and Parameters framework. For those unfamiliar with the background, this is a theory of language that posits a Universal Grammar underlying all natural languages (the principles), and then a finite set of variable parameters which account for the differences among languages. The "parameter setting" is really the sole learning task for the developing child on this account.

I think that in the beginning, this theory was put forth in an effort to address the supposed "poverty of the stimulus," with the hope that the resulting learning problem would be tractable, even easy. Niyogi, however, manages to demonstrate that Gibson and Wexler's assumption of the existence of "local triggers," i.e. a path through the parameter-setting space from the initial hypothesis to the target, is not even sufficient to guarantee learnability at all (though it was believed sufficient by Gibson and Wexler), much less tractability. He further demonstrated the surprising theorem that, for all its carefully thought out design, the Triggering Learning Algorithm is less optimal than a random walk on the parameter space!

At the time of Niyogi's writing, he judged that the Triggering Learning Algorithm was a preferred explanation of language learning in psycholinguistics. His results should really have killed it, but as far as I can see they have had no such effect. In fact, Google Scholar finds only 12 literature citations of his entire book, most of which are due to the author himself. This is hardly a flurry of activity; only one other author writing on problems of natural language learning appears to be among the citations.

## Sunday, January 23, 2011

### Pregroup grammars' generative capacity

There has been a movement within mathematical linguistics toward Lambek's pregroup grammars, which were mentioned in one or two earlier posts. The book Computational Algebraic Approaches to Natural Language (Casadio & Lambek eds., 2008) collects a number of papers on this subject, and this is recommended for those who wish to catch up on the trend. The book is also available as a free download directly from the publisher Polimetrica. Myself, I am somewhat ambivalent about the framework, but there does seem to be some confusion in the literature about its generative capacity.

The first paper in the mentioned volume is "Pregroup grammars and context-free grammars" by Buszkowski and Moroz. This paper relies on a result by Buszkowski (published in the Logical Aspects of Computational Linguistics proceedings in 2001) showing the weak equivalence between pregroup grammars and context-free grammars. Yet Greg Kobele and Marcus Kracht did a paper (unpublished) showing that pregroup grammars generate the recursively enumerable languages. I asked Greg about this, and he explained (as does his paper with Kracht) that key elements of the Buszkowski result are excluding the empty string from the context-free languages, and using only free pregroups. Kobele and Kracht showed, on the other hand, that by allowing all pregroups and also allowing the empty string, one achieves Turing-equivalence, generating all r.e. languages.

This is all esoteric stuff which is nevertheless important to have nailed down when one is working with a grammar formalism. Another issue with pregroup grammars is that they deny the existence of syntactic constituents in the normal sense. But that discussion has to wait for another post.

The first paper in the mentioned volume is "Pregroup grammars and context-free grammars" by Buszkowski and Moroz. This paper relies on a result by Buszkowski (published in the Logical Aspects of Computational Linguistics proceedings in 2001) showing the weak equivalence between pregroup grammars and context-free grammars. Yet Greg Kobele and Marcus Kracht did a paper (unpublished) showing that pregroup grammars generate the recursively enumerable languages. I asked Greg about this, and he explained (as does his paper with Kracht) that key elements of the Buszkowski result are excluding the empty string from the context-free languages, and using only free pregroups. Kobele and Kracht showed, on the other hand, that by allowing all pregroups and also allowing the empty string, one achieves Turing-equivalence, generating all r.e. languages.

This is all esoteric stuff which is nevertheless important to have nailed down when one is working with a grammar formalism. Another issue with pregroup grammars is that they deny the existence of syntactic constituents in the normal sense. But that discussion has to wait for another post.

## Thursday, January 13, 2011

### Russell's "No Class" theory

A fine paper by Kevin Klement on Bertrand Russell's "No Class" theory is published in the Dec. 2010 issue of the Review of Symbolic Logic. Klement outlines a sympathetic reading of Russell's attempts to eliminate actual "classes" as objects in his logical theory, showing along the way how the many high-profile criticisms of Russell (which were mostly swallowed whole by the community) fail to dislodge the No Class theory on any philosophical grounds.

Without going into technical details, Russell tried to define "propositional functions" as being open sentences of logic, such as a predicate applied to a variable as in Mortal(x). While a complete logical sentence such as Mortal(Socrates) makes the statement that Socrates is mortal, the open sentence Mortal(x) has usually been construed as equivalent to a function which maps things (substituted for x) to the statement that they are mortal. Because such a function essentially classifies things according as the resulting statement is true or false, it ends up with classes. Russell wanted to see such an expression as not involving a function or class literally, but rather as just an open sentence at face value.

I wanted to point this out here because the dichotomy between realism and nominalism is central to Klement's discussion of the No Class theory. Klement argues that while a realist philosophy compels one to accept that an open sentence is something, a nominalist position permits the consistent disavowal of that assertion. The realist, being thus compelled, is then forced to identify an open sentence with the function classifying things that could instantiate the variable, since the two are extensionally equivalent. For the realist, then, the No Class theory of open sentences is a chimera because it collapses into the same thing as having classes in the first place. The nominalist, however, permits himself to have an expression like an open sentence that doesn't correspond to something which "exists", and then is not led to an identity between open sentences and functions as maps.

The moral, for linguistic theory, is that it can be very important foundationally whether one construes a theoretical construct as representing something which exists.

Without going into technical details, Russell tried to define "propositional functions" as being open sentences of logic, such as a predicate applied to a variable as in Mortal(x). While a complete logical sentence such as Mortal(Socrates) makes the statement that Socrates is mortal, the open sentence Mortal(x) has usually been construed as equivalent to a function which maps things (substituted for x) to the statement that they are mortal. Because such a function essentially classifies things according as the resulting statement is true or false, it ends up with classes. Russell wanted to see such an expression as not involving a function or class literally, but rather as just an open sentence at face value.

I wanted to point this out here because the dichotomy between realism and nominalism is central to Klement's discussion of the No Class theory. Klement argues that while a realist philosophy compels one to accept that an open sentence is something, a nominalist position permits the consistent disavowal of that assertion. The realist, being thus compelled, is then forced to identify an open sentence with the function classifying things that could instantiate the variable, since the two are extensionally equivalent. For the realist, then, the No Class theory of open sentences is a chimera because it collapses into the same thing as having classes in the first place. The nominalist, however, permits himself to have an expression like an open sentence that doesn't correspond to something which "exists", and then is not led to an identity between open sentences and functions as maps.

The moral, for linguistic theory, is that it can be very important foundationally whether one construes a theoretical construct as representing something which exists.

Subscribe to:
Posts (Atom)