Mathematical Linguistics etc.: 2012

Sunday, December 2, 2012

Linguistic junk drawer

I've been too busy to post for a while, which is upsetting, so I decided to come up with something quick.

I'm compiling a laundry list of commonly taught notions in linguistics which are probably best left in the past. Most of these are found in almost any linguistics textbook. I'm asking for help, brainstorming things that should be purged from the collective consciousness of modern linguistics.

1. Phonemes and allophones. My phonologist colleague Chris Golston pointed out that this is a distinctly Structuralist construct that is in all the textbooks. But it is not a part of current phonological theory, by most accounts. The old notion of phonemes, separate from morphological alternations, is tied to the "systematic phonemic level" of representation, which hasn't been used for decades. Instead of explicitly talking about allophones like they were items, we could just talk about nondistinctive variations of sound.

2. Morphological typology. Can we please stop talking about Humboldt's scheme of "agglutinating", "isolating", "synthetic" and "polysynthetic"? This is a pointless classification.

3. Function vs. content words. My colleagues still believe in this one, but I think it is pretty hopeless. Somebody told me that a content word is one whose meaning is a "cognitive concept," while a function word is one whose meaning is definable as a usage. Is there some accepted definition of a cognitive concept apart from a linguistic usage? I don't think so. Just a lot of debate and competing claims getting nowhere, I thought.

4. Inflectional vs. derivational morphology. I think this is likely a false dichotomy.

That's what I could come up with today. Let's add some more later.

Friday, September 28, 2012

Bloomfield vs. Harris on the road to learnability

For some years I've been aware of work by Alex Clark on learnability of substitutable languages. It always sort of reminded me of my own work on learning what I've called "syntactically homogeneous" languages. This week I sat down and analyzed our approaches to figure out the differences, and it turns out that there are some pretty big ones, even if the driving spirits behind the two approaches are very similar.

I'm directly comparing a paper by Clark and Eyraud [ALT 2005 proceedings edited by Jain, Simon and Tomita] to my own paper published in JoLLI in 2010. Both papers strive to find out what is learnable, and how, by restricting to languages that are somehow "reasonable" from rather older perspectives. Both papers, in fact, resort to criteria put forth by giants of the American Structuralist school.

Clark and Eyraud invoke Zellig Harris's notions of substitutability, and define a strongly substitutable context-free language as one in which two words substitutable in some string context are substitutable in all string contexts where they occur. They go on to prove that such context-free languages are PAC-learnable by a specific algorithm. This condition puts the focus, if you will, on the words themselves in order to define how the language is laid out.

I, on the other hand, go all the way back to the founder of American Structuralism, Leonard Bloomfield, for my inspiration in trying to formulate a reasonably restricted type of language to learn. I'm also dealing with term-labeled tree languages consisting of tree-structured sentences annotated by lambda terms, but that's not critical to the points here. I try to channel Bloomfield's ideas about what "good parts of speech" ought to be, and put the focus on contexts for words, rather than words themselves. A term-labeled tree context is basically a term-labeled tree with a hole in it where a word might go. The condition I formulated to encapsulate Bloomfield's ideas was that, roughly, two shape-equivalent contexts should have the same expressions (and words) able to instantiate them in the language. Such a language has "well-defined parts of speech" and is thus syntactically homogeneous in my terms.

My learnability analysis is quite different from Clark's approach, since I did not study any properties of probabilistic learners. Instead I showed how a certain class of such term-labeled tree languages could be exactly learned from a finite set of "good examples" using a specific type unification procedure.

So how do these differences play out in the learnable classes of languages for each approach? Consider the following set of sentences:
i) Tom hit John.
ii) John hit Tom.
iii) He hit John.
iv) John hit him.
It seems to me that Clark and Eyraud's approach would require "Him hit Tom" and other dubious members to be added to the language or it isn't strongly substitutable. The above set of sentences is indeed syntactically homogeneous on my account as it is, however. My feeling is that my notion of syntactic homogeneity is more likely to be a property of natural languages. But Clark and Eyraud do have their PAC-derived tractability to beat me up with.

Friday, September 7, 2012

Bayesian learning theory

Once upon an age, linguists interested in learning theory were obsessed with the "Gold theorem," and its implications were widely discussed and misunderstood. But the Gold framework, with all its knotty "unlearnable" classes, is not naturally probabilistic. Within cognitive science, much more recently, a huge faction interested in learning has become very attached to different styles of "Bayesian" learning. No one discusses learnability issues related to Bayesian learning, it is just assumed to work to whatever extent is permitted by the type and amount of data fed to it.

It turns out that this smug attitude is quite unwarranted. In fact, the only situation for which a Bayesian learner is guaranteed to converge toward the right result (which is called consistency in this setting) is the finite case, in which the learner is observing from among a finite number of possible events (e.g. the usual cases involving cards and dice). In the more interesting infinite setting, a nasty set of results due to Freedman (published in the Annals of Mathematical Statistics 1963 & 1965) shows that Bayesian inference is no longer guaranteed to be consistent. It might be consistent, depending upon the prior used, but now there arises the problem of bad priors which yield inconsistent Bayes estimates. It seems that cognitive scientists would do well to bone up on some theory, if they are so fixated on Bayesian learning. Watch your priors carefully!

Monday, July 2, 2012

Special issue of JoLLI

Well, this post is about one year late, but perhaps it is better than never. In July 2011, the Journal of Logic, Language and Information published a special issue devoted to papers from the Mathematics of Language conference that was held at UCLA in 2007. The meeting commemorated the 50th anniversary of Chomsky's Syntactic Structures, and so some of the papers use that as a point of departure. This bunch of papers is really worth looking into. Highlights include Geoff Pullum's paper about Chomsky's famous little book; this paper clarifies what Chomsky did, and (mostly) did not, accomplish in his book. It is pretty amazing to see how so much that has been attributed to SS over the years actually was not shown in it at all. Another excellent paper is by Andras Kornai on probabilistic grammars. This paper clears up a number of poorly understood issues surrounding the history and proper definitions of probabilistic grammars, and is required reading for any further research on the subject.

One thing to complain about is the 4 year time span between the conference and the special issue. This works against the timeliness of the papers to some extent, although progress in mathematical linguistics is generally so slow-paced that it almost doesn't matter in the end. I think, pace Gerald Penn's introductory remarks, that it would serve the MOL interest group to just publish their proceedings in a simple way online (as was done for the 2003 meeting), instead of putting so much emphasis on selecting papers for later publication in a journal. Seeing the issue so long after the meeting reminded me of the debacle that swallowed the papers from the 2001 meeting; the official proceedings from that meeting was not released until 2004, and a planned special issue of Research on Language and Computation never materialized, burying some authors' submissions in the collapse. But that's all old news now. I wish the MOL group all the best in their continuing efforts to convene the small community of mathematical linguists, and I wish I could find the time to participate more. On the other hand, the new focus on "green" issues militates against academic conferences in general; all that flying around the world is giving us an outsized carbon footprint that may not be necessary in the current age.

Saturday, May 19, 2012

Production of referring expressions

The April issue of Topics in Cognitive Science is devoted to "Production of Referring Expressions: Bridging the Gap Between Computational and Empirical Approaches to Reference." Referring expressions have a long history in computational linguistics, dating back to the earliest natural language understanding systems like SHRDLU. It is only natural that, if a computer and a human are to have discourse about anything, they will each have to refer to things in the domain of discourse, in such a way that each understands the other.

The guest editors' introductory paper to the topic clarifies the issues dealt with here very well. I'll try to briefly summarize: on the one hand, we have computational linguistics approaches to the generation of referring expressions. A standard methodology for this has been basically a type of referential optimization. The domain of discourse is surveyed and used to figure out the least wordy way of referring to something unambiguously. This turns out to be not all that difficult in many situations, but it is also not very natural. It turns out that humans are frequently neither minimally wordy nor unambiguous with their own referring expressions.

This brings us to the other hand. There is quite a trove of literature in psycholinguistics which investigates human strategies for producing referring expressions. The point of the topic is to bring these two largely separate endeavors together, in two ways. Firstly, the adoption of more realistic constraints and features into computational referring systems is advocated. Secondly, the use of more algorithmically specified methods in explaining human performance in this area is also advocated. It is, on the whole, a very worthwhile topic promoting a worthy goal. In general I concur that computational models of human performance should strive to be more realistic, while cognitive models should be more specific or mathematical.

Saturday, March 31, 2012

Erdös number update

My previous post requires an update, since I received a note reminding me that Ed Keenan also published with Jonathan Stavi, ["A semantic characterization of natural language determiners," Linguistics and Philosophy (1986) 9:253-326]. Stavi has an Erdös number of 2, in virtue of two different collaborations with Erdös coauthors Menachem Magidor and Marcel Herzog. This makes Keenan's number a rare 3 (very low for anyone primarily in linguistics), and my own a more interesting 4. This type of game is what passes for a hobby among mathematicians.

Thursday, March 29, 2012

Erdös number

Scholars with even a passing interest in mathematics usually know what an Erdös number is; it is the number of degrees of separation between a scholar and Paul Erdös, calculated by stepping through collaborative publications. Paul Erdös was a sort of enigmatic freelance mathematician who was very good at proving things, and very prolific with the aid of something like 500 different collaborators. It has become something of a sport in modern times for mathematical scholars to compute their Erdös number, since basically everyone who has published a mathematics article with a collaborator ends up having an Erdös number. I've read that most real mathematicians have an Erdös number of 8 or less.

A more interesting thing is when scholars in neighboring fields, such as mathematical linguistics, end up with Erdös numbers due to cross-fertilization. It turns out that one of my professors, Ed Keenan, appears to have an Erdös number of 4, which is really very low for a scholar outside mathematics. It's so unusual, it bears a stated proof. One chain of collaborative work connecting Keenan with Erdös is the following:

Keenan, E. and Westerståhl, Dag (2011) “Generalized quantifiers in linguistics and logic,” in J. van Benthem and A. ter Meulen (eds.), Handbook of Logic and Language. Second Edition, Amsterdam: Elsevier.

Hella, Lauri; Väänänen, Jouko; Westerståhl, Dag (1997) "Definability of polyadic lifts of generalized quantifiers." J. Logic Lang. Inform. 6:305–335.

Magidor, Menachem; Väänänen, Jouko (2011) "On Löwenheim-Skolem-Tarski numbers for extensions of first order logic." J. Math. Log. 11:87–113.

Erdős, P.; Magidor, M. (1976) "A note on regular methods of summability and the Banach-Saks property." Proc. Amer. Math. Soc. 59:232–234.

And the wonderful conclusion to all this is the theorem that my own Erdös number is 5, thanks to a paper I published with Ed Keenan in a rather obscure special volume of Linguistische Berichte in 2002. Does this increase my mathematical credibility? Not really. But it's nice to know I am somehow closer to the inner circle than I thought. A special bonus will soon come from my upcoming collaboration with Nick Chater, who also has an Erdös number of 4. So then I'll have earned my number 5 in two different ways.

Thursday, March 22, 2012

Survey of proof nets

This isn't much to be proud of, but I seem to have written an unpublishable paper. It is "A survey of proof nets and matrices for substructural logics," which I intended for the Bulletin of Symbolic Logic. However, it seems that they are not much interested in surveys, or if they are, they require them to be pitched to a highly technical audience already versed in the subject. At least that's what I could glean from the short note I got which rejected the paper at a preliminary stage. The "outside expert" who the editor found to make a brief assessment said that the paper was nearly devoid of all content. I don't believe I managed to fill 25 pages with precisely nothing (quite an achievement that would be), so I posted it on arXiv anyway. I believe it will benefit anyone wanting to find out about proof nets and how they relate to the logics which underpin type-logical grammar. Fortunately I already have tenure and am not too proud, so I posted the link to my unpublished masterpiece. All comments here will be appreciated---unless you feel it is devoid of all content. That particular comment has been used up.

Wednesday, February 29, 2012

Poverty of the stimulus - ad nauseam

Linguists are still debating the force of "stimulus poverty arguments." This may never end, but a new rejoinder from Robert Berwick et al. (Chomsky is 4th author) appeared in Cognitive Science (2011 issue 7). In this article, the authors argue informally, but relatively correctly it seems to me, that a host of recent efforts to model the learning of language with virtually no prior assumptions about its structure in fact fail to account for the interesting structural constraints found in real languages.

They point out that recent proposals in Bayesian learning (Perfors et al. 2011) and learning of substitutable languages (recent papers by Clark and Eyraud) remain tied to string-based models of sentences, while the interesting syntactic constraints that are observed have to be represented using syntactic structure of some kind. This leaves the question unanswered, where do the structural constraints come from, and could they be learned? I think that some of my own recent work has addressed the issue of learning structures rather than strings, but then we have to learn from (at least partially) structured data, which presents its own plausibility problems.

An interesting point was brought up by my colleague Chris Golston, who is a linguist very interested in biology and evolution. He reminded me that stimulus poverty and learnability arguments have limited force in reality, because simply proving that a thing does not have to be innate, does not show it is not innate. A simple example comes to us from birdsong; it seems clear to everyone involved that birdsong is not that complicated and could theoretically be entirely learned, yet it is also widely held that birds harbor a considerable amount of innate propensity to sing a certain way. Not being a birdsong expert at all, it is possible I have misread the consensus in that field, but that's how I would summarize it for now. It is, I believe, quite likely that human language has this same general property---perhaps more of it is innate than really "needs to be" from a theoretical perspective.

Tuesday, January 31, 2012

Proof-theoretic semantics

After my previous post about the "meaning is use" credo, I got an email pointing out papers by Nissim Francez and Roy Dyckhoff which develop a Proof-Theoretic Semantics for natural language, and logic also. A good paper to start with is "Proof-theoretic semantics for a natural language fragment," Linguistics & Philosophy 33:447-477 (2010). Other papers appeared in Studia Logica (2010) and Review of Symbolic Logic (2011).

Proof-theoretic semantics is, as the name implies, offered as an alternative to the usual model-theoretic semantics. While I studied formal semantics (which was always model-theoretic) like every other linguist interested in formal approaches, I have to admit I never liked it very much. So I find these new developments extremely encouraging. Here is a quote from the paper in L&P:

For sentences, replace the received approach of taking their meanings as
truth conditions (in arbitrary models) by an approach taking meanings to
consist of canonical derivability conditions (from suitable assumptions).

Arguments against model-theoretic semantics for natural language are certainly out there (e.g. Michael Dummett), but no one has done much about it for an alternative approach. I am certainly in favor of this new set of ideas; these authors develop a direct proof system for natural language in which the "rules of use" for linguistic elements are used precisely as their definitions. And they also highlight some interesting arguments in favor of these sorts of meanings in a cognitive system.

Thursday, January 5, 2012

Meaning is use?

My heart sank a little when I learned of Michael Dummett's death this week. I also learned, however, that he was a follower of the latter-day Wittgenstein credo that "meaning is use." In other words, to understand a word's meaning is precisely to be able to use it correctly, to comprehend its uses. I, too, favor this account of things, and indeed it is pretty commonly assumed in many computational applications. Word senses are often identified with usage patterns in one way or another.

When discussing the idea that "meaning is use" with a colleague a little while ago, however, he was taken aback. "How could any cognitive scientist," he asked, "seriously hold such a view?" His objection seems to be that, cognitively speaking, a word's meaning should be equated with some kind of conceptual space that can be cognized. The problem, I'm told, is that a "usage" is not a cognitively real object.

On the other hand, I think that a cognitive notion of "concept" is actually pretty weak. I'm not acquainted with much empirical literature on this, but it would be worth reading up on how "concepts" have been shown to be cognitively real. And what is the concept behind a word such as and? I'm told that cognitive scientists allow a distinction between function words and content words, wherein function words such as and are tacitly allowed to have "use-based" meanings but content words are supposed to have conceptual meanings. Hmm, so perhaps meaning is use sometimes, even in cognitive science?

Lastly, let's consider a more modern approach like embodied cognition. This, very briefly, is the stance that studying human cognition must be undertaken by recognizing the complete context of the human. Humans seeking to understand words actually learn them by doing, by using them. So even if there is a cognitive "concept" behind a word, this is attained mostly by learning the conventional usage of the word, which is not purely linguistic but interacts with the world as well. It seems that the usage of a word could actually be used to bootstrap the concept behind it.

Mathematical Linguistics etc.