Friday, September 28, 2012

Bloomfield vs. Harris on the road to learnability

For some years I've been aware of work by Alex Clark on learnability of substitutable languages.  It always sort of reminded me of my own work on learning what I've called "syntactically homogeneous" languages.  This week I sat down and analyzed our approaches to figure out the differences, and it turns out that there are some pretty big ones, even if the driving spirits behind the two approaches are very similar.

I'm directly comparing a paper by Clark and Eyraud [ALT 2005 proceedings edited by Jain, Simon and Tomita] to my own paper published in JoLLI in 2010.  Both papers strive to find out what is learnable, and how, by restricting to languages that are somehow "reasonable" from rather older perspectives.  Both papers, in fact, resort to criteria put forth by giants of the American Structuralist school.

Clark and Eyraud invoke Zellig Harris's notions of substitutability, and define a strongly substitutable context-free language as one in which two words substitutable in some string context are substitutable in all string contexts where they occur. They go on to prove that such context-free languages are PAC-learnable by a specific algorithm.  This condition puts the focus, if you will, on the words themselves in order to define how the language is laid out.

I, on the other hand, go all the way back to the founder of American Structuralism, Leonard Bloomfield, for my inspiration in trying to formulate a reasonably restricted type of language to learn.  I'm also dealing with term-labeled tree languages consisting of tree-structured sentences annotated by lambda terms, but that's not critical to the points here.  I try to channel Bloomfield's ideas about what "good parts of speech" ought to be, and put the focus on contexts for words, rather than words themselves.  A term-labeled tree context is basically a term-labeled tree with a hole in it where a word might go.  The condition I formulated to encapsulate Bloomfield's ideas was that, roughly, two shape-equivalent contexts should have the same expressions (and words) able to instantiate them in the language.  Such a language has "well-defined parts of speech" and is thus syntactically homogeneous in my terms.

My learnability analysis is quite different from Clark's approach, since I did not study any properties of probabilistic learners.  Instead I showed how a certain class of such term-labeled tree languages could be exactly learned from a finite set of "good examples" using a specific type unification procedure.

So how do these differences play out in the learnable classes of languages for each approach?  Consider the following set of sentences:
i) Tom hit John.
ii) John hit Tom.
iii) He hit John.
iv) John hit him.
 It seems to me that Clark and Eyraud's approach would require "Him hit Tom" and other dubious members to be added to the language or it isn't strongly substitutable.   The above set of sentences is indeed syntactically homogeneous on my account as it is, however.  My feeling is that my notion of syntactic homogeneity is more likely to be a property of natural languages.  But Clark and Eyraud do have their PAC-derived tractability to beat me up with.



Friday, September 7, 2012

Bayesian learning theory

Once upon an age, linguists interested in learning theory were obsessed with the "Gold theorem," and its implications were widely discussed and misunderstood.  But the Gold framework, with all its knotty "unlearnable" classes, is not naturally probabilistic.  Within cognitive science, much more recently, a huge faction interested in learning has become very attached to different styles of "Bayesian" learning.  No one discusses learnability issues related to Bayesian learning, it is just assumed to work to whatever extent is permitted by the type and amount of data fed to it.

It turns out that this smug attitude is quite unwarranted.  In fact, the only situation for which a Bayesian learner is guaranteed to converge toward the right result (which is called consistency in this setting) is the finite case, in which the learner is observing from among a finite number of possible events (e.g. the usual cases involving cards and dice).  In the more interesting infinite setting, a nasty set of results due to Freedman (published in the Annals of Mathematical Statistics 1963 & 1965) shows that Bayesian inference is no longer guaranteed to be consistent.  It might be consistent, depending upon the prior used, but now there arises the problem of bad priors which yield inconsistent Bayes estimates.  It seems that cognitive scientists would do well to bone up on some theory, if they are so fixated on Bayesian learning.  Watch your priors carefully!