Mathematical Linguistics etc.: 2010

Sunday, December 19, 2010

This sentence is false.

A new publication by Philippe Schlenker ["Super Liars" in Review of Symbolic Logic 3(3)] presents an excellent new treatment of that old chestnut, the Liar paradox (exemplified in my post title). Schlenker's insight is to develop a technical logical semantics which, rather than trying to solve the paradox, tries to account for how human languages cope with the paradox while devising a sane system of truth values for human languages.

Schlenker begins with a now standard assumption that an account of Liars and other paradoxes in natural language requires a third truth value. He then shows in an easily readable fashion how this step leads directly to the need for allowing ordinal-many truth values, with a sort of infinite hierarchy of so-called "super liars" arising as well. The resulting treatment is as expressively complete as can be expected, providing the ability for a language to express that any given "super liar" sentence is something other than true.

The semantics of paradox must eventually be integrated into a functioning semantic theory for natural language; this paper makes a good start.

Tuesday, December 14, 2010

Proof nets for grammatical logic

A "proof net" is a kind of compressed proof that was developed for linear logic in the late 1980s. They have been studied a great deal ever since for a number of reasons, one of which is that linear logic is a close relative of the Lambek logics that are useful in linguistic syntax under the modern-day rubric of type-logical grammar. A good source to learn about type-logical grammar and linguistics is my now passé 2004 book (not in most libraries but available cheaply as an ebook, though try to ignore Ch. 6), but also Glyn Morrill's 1994 book (not available cheaply but found in many libraries). For learning about linear logic, type-logical grammar, and proof nets all under one roof, the one and only source is Richard Moot's PhD dissertation, a real tour-de-force. My own dissertation (which grew up into the book) is quite embarrassing compared to this masterpiece, which is still available here.

I'm currently working on a survey paper about proof nets and other proof "compressions." I've actually been working on this paper for about ten years, but I really feel I might finish it soon. The purpose is to highlight the subject for the broader logic community, and to focus primarily on how the proof compressions become more and more geometrical as the logic gets stricter control over the arrangement of formulae. For classical propositional logic, the most compressed proof object is a "matrix" of atoms derived from the formula tree. The condition ensuring provability of the formula is basically set-theoretical, there is not much geometry to it. For linear logic, one keeps track of multisets of formulae (counting occurrences), and so the proof net provability condition is more graph-theoretical. For the associative Lambek logic we no longer have commutativity of the formulae, and the corresponding proof net condition comes to involve "planarity" of the graph. This is fundamentally a geometric condition that can be expressed using topology. The most restricted logic of all is the nonassociative Lambek system, and extending the proof nets to this case has to involve more specific topological conditions which I am still working out for my paper.

All of this seems important to me because if linguistic grammar is to be modeled type-logically, the "proofs" of sentences should be able to be carried out in a highly compressed fashion if the system is to have cognitive relevance. Generally, the brain's innate computational abilities seem to invoke highly efficient methods that are implemented in the neural "analog computer." But that leads me to a discussion of analog computation in the brain, which should be left to another post.

Tuesday, November 30, 2010

Niyogi: Informational Complexity of Learning

This is the first in a series of posts highlighting the contents of Partha Niyogi's 1998 monograph The Informational Complexity of Learning, and using this as a jumping-off point for some discussion of my own.

I'll begin with some brief notions from Chapter 2. In this chapter, Niyogi emphasizes some very important points of learning theory, while developing a new analysis of learnability by neural nets. In a fairly standard mathematical model of learning championed by Vapnik and others, we view a learner as attempting to formulate the best hypothesis possible within its hypothesis class, which approximates the target concept within the concept class.

Let me just stop right there, in order to discuss the ramifications of this for language learning. Though this framework does not originate with Niyogi, in my view he is one of a few linguists who understood it. I believe that this setup alone may be used to all but prove the necessity of some kind of "Universal Grammar" which is commonly advocated. Universal Grammar should be seen as the limitations on "possible human languages" that in effect makes the hypothesis class of languages used by the human learner sufficiently small. Numerous negative results have shown time and again that overly large classes of languages are not strictly learnable, essentially because they are too big. To me, this speaks loudly against any language learning model which invokes a "general cognitive learning" idea, as if humans could leverage their general abilities to successfully learn whatever kind of language is thrown at them. We already know from experience that humans can only learn human languages. Creolization from simpler fabricated lingua francas and pidgins is easily understood in this way. In that scenario, the target concept is outside the hypothesis class, and the learners settle on the best hypothesis in the hypothesis class, which is in fact a human language.

I presume that Universal Grammar is an innate set of things delineating the required properties of a human language, and which thereby also delineates the hypothesis class which is used by human learners. Beyond that, I do not know what it is exactly. I will continue to use Partha's book to further this discussion in later posts.

Monday, November 8, 2010

Partha Niyogi: a reminiscence

Our small field was dealt a body blow by the death of Partha Niyogi last month. An official obituary can be found at the University of Chicago news site. In this post I will not repeat this information, but will just offer a personal perspective on Partha from my interactions with him while we were colleagues.

I worked as a Visiting Assistant Professor in the Linguistics and Computer Science departments at the University of Chicago for four academic years, spanning from 2000-2004. During the first part of this period, Partha was an Assistant Professor, but he was awarded tenure sometime before I departed. Partha's research breadth was something I always admired about him, especially since I aspired to contribute research in more or less similar areas encompassing mathematical linguistics, learning theory, and speech processing. We had a number of conversations over the years mostly about my research rather than his. He was always interested to talk with other people about their work. I also interacted with Partha frequently in the ways that colleagues do, and I got to learn how he dealt with research through his reactions to various talks given by the many visitors who passed through.

One thing that always struck me about Partha was his incredible mathematical knowledge. I was still really a beginner compared to him, and even now I can only say that I am a little more advanced at math than I was then. His skills and knowledge in math are still a model which I will always aspire to, and probably never achieve. In spite of the obvious difference between our skill levels, he was never condescending when discussing mathematical subjects with me. He just spoke at his level and kind of assumed that I would do my homework if I didn't understand right away.

The work of Partha's with which I am the most familiar is his first book The Informational Complexity of Learning, from which I learned and am still learning many things about Kolmogorov complexity, neural nets, and other topics. It is still a timely piece in my view, and in the future I hope to offer some posts highlighting many parts of this book. I remember when I told him I had bought his book, he was kind of like "what did you do that for? It's quite expensive." He was never proud or arrogant.

I've seen a number of posts and reminiscences around since his death which paint a picture of Partha as a kind of renegade, always willing to go for the brass ring in his research, to go beyond what most people thought was doable or even sane. I would venture to offer some counterweight to this view, however. In my experience, Partha was, while very skilled at making real progress beyond the incremental advance, nevertheless quite conservative scientifically. He regarded much of my best work in speech processing as kind of "crackpot," in fact. He openly despised my work on the reassigned spectrogram, which I was just starting to promote as the next great thing for acoustic phonetics and speech science. I once asked him if he would consider trying reassigned spectrograms for his speech recognition research, and he told me flat out that he didn't believe the representation was meaningful, and that he "wouldn't touch it with a ten-foot pole," an expression that kind of surprised me from a non-native English speaker. I guess every time I do some more research on reassigned spectrograms, I'm still trying to prove myself to Partha. It's sad now that he won't be around to "eat his words," because I'm convinced the project finally worked out to something that would have satisfied him. I know that, confronted with a better understanding and some solid conclusions, he would have happily dined on his former naysaying. I'm very sad to have lost an old adversary and sparring partner.

Wednesday, October 27, 2010

Nominalism reconsidered

A while back I declared myself a "nominalist" when it comes to linguistic theory, as opposed to a realist.

Well, hold on a minute. It's not entirely clear what I meant, because it's not entirely clear what the terms mean. In the philosophy of mathematics, "nominalism" generally means the stance that "there's no such thing as an abstract thing." [E.g. the Oxford Handbook of Philosophy of Mathematics and Logic] So on this view, not even a number really "exists," whatever that means. I think it is a little extreme to claim that there are no abstract things in this world. I'm not sure why I think that, but I do.

In the philosophy of linguistics (one of the most starved for literature), "nominalism" would have to mean something else. But it's complicated to figure out exactly what. I'll take a stab by saying that in linguistics, nominalism is a stance that says "elements of linguistic theory are not necessarily literally elements of the cognitive structure of language, although they might be."

This is quite a bit weaker than mathematical nominalism, whatever it all means.

Thursday, October 14, 2010

Topology of language spaces

Very little has been written, it seems, about the topology of language spaces. Fix a (finite) vocabulary V. Then the Kleene-star set V* represents the (countably infinite) set of all expressions over V. The language space P(V*) is the set of all sets of expressions, so it is the power set of V*, meaning it has the cardinality of the reals. It is well known that most of these languages are uncomputable (non recursively enumerable), but it would be useful to provide a good topology to "metrize" the language space somehow. For an application of this, it would be very interesting to provide a suitable metric of the "distance" between two languages in a language space.

Very little work has been published which attacks this problem. The definitive work appears to be the PhD dissertation by David Kephart (University of South Florida, 2005) "Topology, Morphisms, and Randomness in the Space of Formal Languages." This dissertation first studies the only prior work on the subject, which defined the Bodnarchuk metric as a distance between languages. This is, roughly, the shortest expression length at which differences between the two languages appear. The resulting metric space of languages is in essence the Cantor space. It is an unsatisfying metric for linguistic, or any other, purposes.

Let's think for a moment about the problem to be modeled topologically. Consider two dialects of English, call them AE and BE. Assuming they have the same vocabulary (we can agree, I hope, that vocabulary differences alone are mundane and easy to describe), any differences between them have to be "grammatical," meaning speakers of AE may assent to certain sentences which are rejected by BE speakers as ungrammatical. We might like to know, then, how "far apart" are these dialects grammatically? As soon as any major grammatical difference appears, given the recursive nature of natural languages, it becomes quite likely that any difference between potential construction patterns will give rise to an infinite number of sentences which distinguish AE from BE. But yet, if AE and BE still agree on the majority of construction patterns, they may appear at a random glance to still be very nearly the same. So then, there is no hope for a simplistic metric like the cardinality of the distinguishing set of sentences, because in any interesting case the two dialects will be distinguished by an infinite number of sentences.

With that, let me get back to Kephart's thesis. After analyzing what he calls the "Cantor distance" mentioned above as fairly worthless, he goes on to propose two new metrics, which are in fact pseudo-metrics, because they give the distance 0 between some pairs of distinct languages. The first of these is called the Besicovitch norm when applied to one language, or Besicovitch distance when applied to the symmetric difference between two languages. (The name stems from the originator of the metric, where it was first applied in a treatment of cellular automata). Its definition is complicated because it counts cardinality by sectioning. The Besicovitch norm (or size of a language, by this measure) is defined as the least upper bound on limits of ratios between the cardinality of sections (up to length k) of the language to the cardinality of up to k-length sections of V*. To get a distance between AE and BE, simply use this norm to measure the size of the symmetric difference set between the languages. This quantity is a surjective mapping from P(V*) onto [0, 1], so all distances are a positive fraction at most 1.

The resulting pseudo-metric language space is not very nice topologically because it isn't even T0, but this can be improved by taking the quotient space which identifies all the "equivalent" distinct languages. This quotient space is called the Besicovitch topology. In it, all the finite languages have size 0, and so cannot contribute anything to the distance between two languages. This alone is a positive feature, in my view; for one thing, it implies that two languages which differ by only finitely many sentences are essentially the same. But the Besicovitch distance doesn't quite capture an intuitive feeling of the distance between languages for a number of reasons.

Kephart then proposes a refinement, a differently defined "entropic" pseudo-metric. It is this which I feel is getting close to a linguistically relevant language distance. As he summarizes, "where the Cantor metric hinges on the first apparent distinction between languages and disregards the rest, and the Besicovitch distance relies upon the total proportion of the monoid (V*) exhausted by language distinctions regardless of the expansion rate of this proportion, the entropic distance will take into account both the appearance of distinctions and their rate of increase." Instead of computing a proportion from the size of the up to k-length section of a language (or symmetric difference between languages), the entropic norm of a language computes a normalized topological entropy of the precisely k-length section. This is relevant because the topological entropy of each k-length section measures the exponential growth rate at that point. So, the entropic norm is defined as the "lim sup" (least upper bound on the limits) of this normalized entropy as k increases. The entropic distance between languages AE and BE is once again simply the entropic norm of their symmetric difference set.

I will need to do some further research to see how much I like this measure of the distance between languages, and the resulting topological language space. It certainly seems promising to me at the moment, and a further analysis could provide for a great deal of work for me or whoever was interested.

Wednesday, October 6, 2010

The truth? It's complicated

An interesting new paper by Kentaro Fujimoto in the Bulletin of Symbolic Logic 16(3) discusses logical "theories of truth". For this precis, I will liberally plagiarize Fujimoto's paper, I doubt he'll mind. Start with B, a basic underlying logic which is a first-order classical system of arithmetic such as Peano arithmetic. Then try to introduce a truth predicate T, so that T(x) means "x is true" for a proposition x. This will get you a theory of logical truth, but it turns out there are a million or so ways of doing it, and none of them are currently accepted as terribly good.

Disquotational theories of truth.
Truth is axiomatized by Tarskian T-biconditionals, which are statements of the form:
T("P") is true iff P. If we allow P to range over sentences that may contain the truth predicate, the resulting theory becomes inconsistent due to the Liar paradox. Nice!
So, a disquotational approach must impose a restriction on the class of sentences available for P in the biconditionals. The obvious restriction is to just not allow the truth predicate in P at all. Problem is, the resulting logic TB has lost the ability to deduce anything about truth. It is a theorem that TB is proof-theoretically equivalent to the basic underlying logic without any truth statements.

Hierarchical theories
Tarski provided the most widely known truth theory, the original hierarchical theory. His definition entails the T-biconditionals of the above form for the target language; however, the definition cannot be carried out within the target language. This is Tarski's theorem on the "undefinability of truth" in any one language. To incorporate this into a working theory of truth, he proposed a hierarchy of languages in which truth in one language could be defined in another language at a higher level.

Iterative compositional theories
As described in Feferman 1991 [J. Symb. Logic 56:1-49] "truth or falsity is grounded in atomic facts from the base language, i.e. can be determined from such facts by evaluation according to the rules of truth for the connectives and quantifiers, and where statements of the form T("A") are evaluated to be true (false) only when A itself has already been verified (falsified)." Such an approach is iterative in that a truth statement T("A") is true only if A is true, and it is compositional because a compositional sentence is determined only by its components according to the logical evaluation rules for connectives and quantifiers.

Feferman's Determinate truth theory
In this approach we limit the truth predicate, so there are two predicates T, and D its domain of significance. Feferman wants D to consist of just the meaningful and determinate sentences, ones which are true or false, in other words. Feferman further requires that D is "strongly compositional," so that a compound sentence is D iff all the substitution instances of its subformulae by meaningful terms belong to D.

The last two kinds of theories have the advantage of being "type-free," since they allow the proof of of the truth of sentences which contain the truth predicate. But, they have consistency problems remaining. Fujimoto's paper jumps off from here, and proposes a new improved truth theory which is also type-free, and then shows how to compare disparate truth theories using the new notion of "relative truth-definability."

It's heady stuff. I had only read Tarski's famous truth theory, and hadn't realized there were all these other proposals and complicated issues.

I think all this relates to linguistics, because let's face it, how can we have a theory of semantics without a theory of truth? How do people decide what's true? It's kind of mind-boggling.

Wednesday, September 22, 2010

Complexity of context-sensitive languages

It is often thought that possible human languages could be context-sensitive, referring to the Chomsky hierarchy. Many frameworks of formal grammar have converged on the restricted subclass of the "mildly" context-sensitive languages as a useful compromise, because the context-sensitive class itself presents problems for parsing complexity. Mildly context-sensitive languages are, by definition, parsable in polynomial time. A referee just pointed out to me something I should have remembered, which is that the universal recognition problem for the full context-sensitive class is PSPACE-complete (hugely intractable, in other words), and that Garey and Johnson's book (1979) presented an example of a single context-sensitive language that was PSPACE-complete. This means that it is less likely that a human language could be drawn from the class of context-sensitive languages beyond the "mild" ones, from a cognitive perspective.

On the other hand, I believe that the brain has quick solutions to intractable problems. More on that in a later post.

Saturday, September 18, 2010

Compositionality without type theory

The principle of compositionality is often held up as fundamental to natural language semantics. This principle has been discussed and fine-tuned in plenty of papers in the literature (including one of my own, see Fulop and Keenan 2002 in the book Semantics edited by Hamm and Zimmermann). In a rough form, it is usually stated something like "the meaning of an expression is a function of the meanings of its parts and the way they are syntactically combined." This gets technically messy when one has to specify what a "meaning" is supposed to be, but there is pretty much a consensus view that some form of this principle is behind the functioning and learnability of natural language. In a language completely devoid of compositionality, any given sentence could take on any arbitrary meaning, so that "I painted a picture" might mean "the moon is a green cheese."

I guess that most linguists working on formal semantics use some kind of type-theoretic system, but the importance of type theory is much more questionable. Some authors dislike type theory as a semantic model. An interesting paper by Erdelyi-Szabo, Kalman, and Kurucz "Towards a natural language semantics without functors and operands" was published in the Journal of Logic, Language and Information (2008). This paper proposes a new scheme of formal semantics that is compositional but does not employ the typical type-theoretic system wherein a verb is viewed as a "functor" requiring an argument, for example. In their view, "compositionality presupposes that we are able to compare arbitrary meanings in terms of information content." They sketch a formal system in which the basic entities are "pieces of evidence" which might engender a belief in something. The formal language involves just a single binary non-associative construction operator used for creating structured complex terms representing pretty much anything. Maybe such a scheme should be seriously considered, given the many difficulties with the "idealization" of semantics as type-theoretic in nature.

But we have to keep it compositional.

Wednesday, September 8, 2010

Fixing intensional logic

One of my favorite hobbies is reading about intensional logic. This refers to any form of logic that allows us to refer somehow to the "sense" of a logical entity, in addition to the extension. I have never contributed any of my own research to this topic, but I believe that we can never have a useful computational theory of natural semantics without some kind of decent intensional logic.

It seems like every proposed form of intensional logic through the years has got something wrong with it. Quite a bit of recent literature is still devoted to the seemingly endless task of "fixing" some variety of intensional logic.

As far as I can see, there are two dichotomies that can help classify various approaches to intensional logic. You can do things with type theory, or without it. You can do things with possible worlds, or without them. I have already posted about my dislike for possible worlds. In fact, I doubt there is a possible world in which I like them! But kidding aside, for a working intensional logic I favor Church's formalization of Frege, the Logic of Sense and Denotation---type theory, but no worlds. This logic was published in several incarnations over many decades. In fact, Church's own publications on the subject span from 1946 to 1993! This might qualify as the longest period spent by one scholar publishing papers on something that remained unsatisfactory, even to the scholar himself. I certainly hope to spend 47 years publishing, pretty well anything, if you want to know the truth.

But there is hope for Church's method, apparently. Quite a few scholars followed up on Church's flawed logic in the past decade (including some leading names like Terence Parsons), and the most recent entry in the game to fix Church is a paper by Kevin Klement in the June issue of the Bulletin of Symbolic Logic. I'll fill in the details in another post in the near future, but I direct your attention to Klement's fine paper in the meantime.

Friday, August 20, 2010

Quantum mechanics and the spectrogram

For this post, I wanted to show a little mathematics, so I used html with MathML.
I think this blogger is not enabled for that, so I'm redirecting to this link for the post:

Quantum mechanics and the spectrogram

This should display seamlessly in Firefox; not sure about IE yet.

Thursday, August 5, 2010

Philosophy of phonology

OK, so I failed to post numerous things last month. Evidently the summer is a challenging time for my blogging abilities. But, perhaps less is more.

Getting to the topic, I've been reading Anderson's (1985) Phonology in the Twentieth Century, and I'd like to highlight his description of the philosophical conflict between J. R. Firth's conception of phonology and that offered in the American generative paradigm spearheaded by Jakobson, and later Chomsky and Halle. Anderson characterizes the conflict as one between a realist philosophy of the American school, against a nominalist philosophy of the Firthian school.

Now, I don't want to specifically advocate for the Firthian paradigm in phonology, but when it comes to distinctive features in phonology, consider me a nominalist. Anderson describes nominalism as regarding the "idealizations of scientific theories simply as names for the analytic categories of the scientist," and to the extent that distinctive features are idealizations dreamt up by phonologists, I think the nominalist view is plainly correct.

On the other hand, according to Anderson "most empirical scientists, including linguists, tend to take a strongly realist attitude toward the essential elements of their theories." I don't know how uncontroversial such a claim about scientists is, but I personally feel it would be ludicrous to assert without specific evidence that a particular distinctive feature was in any sense "real." As a matter of fact, first you would have to specify in what way it was supposedly real, and then you would have to somehow prove that it was real in that sense (let's say, in a cognitive sense). Try it sometime! I don't believe there is any result that shows that any particular distinctive feature (not a phonetic feature mind you, but an abstract distinctive feature, let's say [+consonantal]) is cognitively real, and binary-valued, in the human mind.

So let's face it, when we discuss phonology, it makes more sense to be nominalists instead of realists. Realism is a kind of science-cum-religion that requires suspension of disbelief. Nominalism is purely analytical, without the extra baggage offered by realism. Why should a linguist care? Because whether you are a nominalist or a realist can affect what you would say about a variety of research projects. A nominalist would wholeheartedly support a project to, e.g., formulate a (possibly mathematical) theory of how binary or privative (i.e. unary) distinctive features appear to emerge from more specified phonetic features in the phonological systems of languages. Such a project would specifically examine how our analytical categories are founded in the detailed reality (as they ultimately must be). A realist, on the other hand, may be inclined to dismiss such a project because, after all, distinctive features aren't emergent properties, they're real, and so exist independently in a Platonic realm or something. Householder and Firth both caricatured this position as the "God's truth" view, that somehow if we could magically hit upon the right system of distinctive features, we would unearth God's truth.

But I also suggest that linguists keep the philosophizing to a bare minimum, just enough to sort out one's methodological position before deciding which projects might be worthwhile.

Thursday, July 8, 2010

Features in phonology and phonetics

OK, first let me apologize for the long absence. This was owing to many factors within a busy summer including poor internet access, but I am planning several posts for this month.

I'm going to discuss phonology and phonetics in the next few posts. I'll begin with the problem of features.

The theory of features has never been worked out to anyone's satisfaction, and there are many competing accounts. I am engaged in a project on "phonetic features" with my colleague Darin Flynn of the University of Calgary, and we are trying to sort things out. First of all, since Jakobson we have had "distinctive features" in phonology. These are generally assumed to be binary or unary-valued, either present or absent in some sense. They operate at the abstract level of phonological or phonemic representation (whatever you take that to be). They are not precisely indicative of phonetic properties, but are in some way derived from the phonetic properties of speech sounds. Their purpose is to include every feature whose presence/absence "can serve to differentiate one utterance from another in any language" (Anderson 1974). This is why they are called distinctive features.

So far, so good. The more problematic topic concerns whatever is "below" the abstract phonology, in the phonetic realm. Some models essentially propose that there are such things as "phonetic features" which specify sounds at some phonetic level. This dates back at least to Chomsky and Halle's SPE model. It is a pretty good idea, but there are problems with their proposal that the phonetic features are merely the "other side of the same coin", wherein phonetic features are scalar-valued objects that are linked with or somehow identical to the distinctive features of the phonology. Historically this subject has become terribly confused, because much literature has called the binary distinctive features of phonology by the name of "phonetic features" and has implied that these provide phonetic specifications.

Anderson (1974) gives some nice reasons why the phonetic features should really be different from the distinctive features. He states that phonetic features should include "every feature in terms of which one language differs systematically from another," and these may include features which are never distinctive, such as whether final stops are released. Anderson proposes that going from phonology to phonetics one uses "detail rules" which specify the scalar values of the phonetic features on a language-specific basis. This is really the phonetic component of the grammar, as against "universal phonetic implementation" which applies to all languages because it is derived from the human speech production mechanism.

Another reason (to allow phonetic features different from distinctive features) that Flynn and I have come up with comes from models of sound change in diachronic linguistics. We noticed that numerous sound changes are well-described as resulting from auditory confusions among sounds with similar values of acoustic phonetic features such as [grave] and [flat], while these features construed as binary-valued are not useful in distinctive feature theory for phonology.

A remaining question is whether the set of phonetic features should be disjoint from the set of distinctive features, or could they be somehow linked when they overlap in a useful way?

Thursday, May 27, 2010

Type theory in cosmology?

I ran across something interesting in the book Cosmology by Liebscher (2005). A conundrum known as Olbers's paradox has provided motivation for cosmological considerations. This paradox demonstrates that, if there were an infinity of potentially observable stars, the sky should have infinite brightness. Modern cosmology takes this as a proof that the universe has a history in which there were no stars, and thus the number of stars we can see by looking far enough away from Earth is effectively limited by time.

There is an alternative attempt to resolve this paradox, credited to Lambert (1761), which assumes the universe consists of systems of different order. As Liebscher describes, "a system of order n contains g_n systems of order n-1, where g_n is the multiplicity. . . now every integral over the universe may converge." This naturally reminded me of type theory being invoked to resolve paradoxes of semantics etc. Lambert's "ordered universe" cosmology is known to be inadequate because it predicts an inhomogeneous universe at every distance scale, which is contrary to observation. Many would also say that type theory is an inadequate solution for our problems in logic and linguistics.

Sunday, May 16, 2010

Modeling the learning of grammar

In mathematical approaches to linguistics, the results are usually solid because they are derived mathematically. But there always remains the question whether the mathematical model chosen does actually serve as a satisfactory model of something. Carnap pointed out a long time ago that this question does not have a formal answer, it is really quite subjective. He used the term explicatum to refer to the formal model of an informally expressed explicandum.

When it comes to the learning of grammar, even restricted to syntax of natural language, it is not even so clear what the explicandum is. The only obvious fact is that children do indeed learn a language. Suppose it has a grammar---then what? Researchers disagree on how to model the learning of grammar. There is a canon of literature in algorithmic learning theory that models grammatical inference using only text strings as input data. Alex Clark is a contributor to this literature, and he has derived interesting results concerning the identifiability of substitutable context-free languages. I noticed in the 2009 Algorithmic Learning Theory proceedings that this result has recently been extended to substitutable mildly context-sensitive languages, the latter being a ubiquitous class that turns up again and again in mathematical linguistics.

A substitutable language is, roughly, one where any two substrings sharing the same context (construed linearly, not structurally) also share all their contexts. I.e. the language allows them to substitute for one another. It seems to me that this is too strong a property to be true of natural languages.

Aside from that, it is quite unlikely that children learn language from word sequences only, divorced from meanings. For these reasons, I have pursued a different modeling approach in my work on type-logical grammar induction (see my new paper in the Journal of Logic, Language and Information for a full account). I hold that learning should be modeled using sentences annotated with structural information. I think it is possible that children could use a kind of corpus analysis, together with their meaningful experience, to glean this information without benefit of a complete grammar.

But you see, the matter of modeling is subjective. I had a (friendly) argument about these very points with Alex Clark a few years ago. He was equally committed to pursuing the pure string learning model. He said my assumptions were too rich, the extra information was too plentiful. We weren't arguing about mathematical points, but rather about what is the best explicatum for the right explicandum. We can't even agree about the explicandum.

Mathematical linguistics will benefit, I believe, from the parallel pursuit of different explicata for different explicanda. Then when we know more, due to advances in cognitive science or whatever, a wide range of different mathematical results will already be known.

Sunday, April 25, 2010

Why did Gentzen use sequents?

Here's a question that I've long thought to be a puzzle in the history of logic. Perhaps I've been wrong about this, but there doesn't seem to be any reason for Gerhard Gentzen's sequent calculus to make use of sequences of formulae.

Let me try to clarify what I mean here. A sequent, for Gentzen, is a special inference expression connecting a (possibly empty) sequence of antecedent formulae with at least one consequent formula. The system for intuitionistic logic requires just one consequent formula, while the classical system allows a sequence in the consequent. The sequent calculus is then a way of defining classical or intuitionistic logic as a scheme of inferences among sequent expressions rather than the formulae within. Baez and Stay (2009) nicely explain that Gentzen used sequents so he could develop a logical calculus that has the subformula property, which has in turn led to so many useful developments. But that only requires sequent expressions connecting sets of formulae, not sequences of formulae.

I feel there remains the question, why did Gentzen insist that the antecedent and consequent of a sequent be a sequence of formulae? He immediately had to compensate for this by invoking the structural rules, special rules that eliminate the sensitivity of his systems to the sequence structure. He did this, of course, because classical and intuitionistic logic aren't sensitive to the structure of formulae occurring in a sequence. In fact, later authors relying on Gentzen to present classical or intuitionistic logic have frequently just gone ahead and redefined sequents as special expressions connecting sets of formulae, thus doing away with the structural rules. Logicians more recently have played extensively with the logics that result from not using one or more of the structural rules---these are the substructural logics, which are used in type-logical grammar, among other applications. But Gentzen didn't suggest doing away with any structural rules, so far as I know. So then, why did he insist that formulae came in sequences, only to turn around and insert provisions that eliminated this sensitivity? We certainly have to thank him for his odd bit of foresight, given how much fruitful research has been done using substructural logics in recent times.

Friday, April 23, 2010

Against possible worlds

I admit it, I never liked possible worlds semantics. Montague grammar and any common relative uses it as part of the intensional semantics for natural language. The low value of possible worlds as a methodology was made very clear for me when I read Forster's essay "The Modal Aether," published in Reinhard Kahle's volume Intensionality (2005) in the Lecture Notes in Logic series. So I would like to see more linguistic semantics developed without this baroque and unfortunate construct. Yet, I also feel intensional logic is worth having in semantic theory, so I don't want to give up on it.

Doing away with possible worlds within intensional logic has been tried in at least two ways. Church's intensional type theory (published in Nous in 1973, 1974, and 1993) accomplishes the goal by assigning a special type for "intension" of an entity, and another for intension of a proposition. The Montague approach would assign as intension a function from possible worlds (or world-time indices). Church's approach further provides an infinite chain of intensions of intensions, represented as an infinite chain of new types. So, while Montague's scheme relies on an infinite function space, Church's scheme would require an infinite number of "primitive" types, which is a little uncomfortable.

Another way was explored by George Bealer, especially in his book Quality and Concept (1982). Bealer shows how intensional logic can be set up using an operator that he writes with [], so that a formula [A] simply represents the intension of logical object A, which can be any of the usual things in a first-order system. Bealer takes the logical notion of intension as primitive, and does not attempt to define it as a function from such to such or whatever. My problem with this system is that it tries to be first-order, which means that you cannot organize a formula to reflect a decent linguistic composition of the subterms with respect to meaning composition. In other words, Bealer's system applied to natural language cannot be compositional (or so it seems to me), since it inherits the same problems for linguistic compositionality that have always plagued first-order logic.

But surely one of these systems or a relative could be applied to yield a decent intensional logic for natural language semantics that does not use possible worlds.

Friday, April 16, 2010

Morphosyntax: the next frontier

In an earlier post I wondered whether the newer pregroup grammar framework should supersede type-logical grammar for syntax. I have now decided such niceties are not important. The fact is, we have more than enough machinery to handle the pure syntax of language "algebraically" or "logically," however you view it. My own work has contributed to the learning algorithms for this. I think it is time to move on.

What's next? Morphosyntax. Many, perhaps most, languages involve a very significant amount of morphology which contributes essentially syntactic information. Some, such as Navajo, have a very simple syntax and a cripplingly complex morphology does most of the actual work. Yet mathematical linguists rarely treat morphosyntax. I don't really know how to handle it, myself. My basic suggestion is that we need a new class of mathematical models that would be capable of handling a language such as Navajo. These would be able to relate semantics directly to the morphology and syntax at the same time. Preferably without making reference to morphemes. Most morphology is not very well described morphemically, but it all submits rather well to a relational view in which the morphology is modeled as a system of form-meaning relations among words.

This is a challenge I hope to take up one day in the not-too-distant future. I of course invite anyone to tell me what is already out there in the vein of mathematical morphosyntax. My acid test is, does it work for Navajo?

Tuesday, April 6, 2010

Kornai on complexity and learning

Learning theory is a subject I am very close to, so Chapter 7 of Kornai's "Mathematical Linguistics" is one I read with great interest. The first section is a quick run-through of essential concepts from information and coding theory. The second section is an equally quick run-through of informational complexity, so-called Kolmogorov complexity. The third section is closer to my own expertise, and it discusses the three most important topics in learning theory today, namely minimum description length, identification in the limit (the Gold inductive inference paradigm), and probably approximately correct learning.

There are very few, if any, chapters or articles on formal linguistics that address all these important topics. I do feel that Kornai's chapter is too short, and fails to highlight connections between the topics considered. The subsection on identification in the limit does make what I see as the most important point about this paradigm for the linguist, which is that while Gold's theorem famously showed that no language class is identifiable which is "superfinite," a superfinite class by definition contains *all* finite languages over the vocabulary, and there is no reason to suppose that possible human languages include all (or even a single) finite language. For this reason, Gold's notorious result does not impede or even have much relevance to the formal modeling of learning human language by inductive inference. This point cannot be overemphasized, and my feeling is that research in formal modeling of human language learning was set back decades and became extremely unpopular because of widespread misunderstanding of the import of Gold's result.

For lack of time, this entry will be my last about Kornai's book for the moment. There are three further chapters which concern hidden Markov models and related things about speech and handwriting pattern recognition. A highly recommended read, I hope these posts have whet the appetite of some readers here and will send you to look at Kornai's book.

Sunday, March 28, 2010

Kornai on semantics

Chapter 6 of Kornai's "Mathematical Linguistics" considers ways of treating natural semantics formally. While the syntax chapter was in many ways a run-down of other treatments, this chapter appears to make some new proposals. The first section discusses some of the classic problems of the philosophy of language such as the Liar paradox, and largely dispenses with a great deal of the angst that philosophers and logicians have put into their treatments of these matters. K. basically shows how a lot of the technical problems with these paradoxes disappear when one deals with meanings in a way that is more appropriate for natural language modeling. He finishes by proposing that natural semantics has to have a paraconsistent logic as a formal model, which allows for inconsistency and contradiction without a total breakdown of reasoning---seems like a great idea to me.

The second section gives a very nice overview of Montague grammar (of course a must for any book on mathematical semantics). But this overview introduces some new ideas and draws in other methodologies. Prominent is the idea of a "construction," which is a semantically interpretable constituent structure analysis. K. discusses the importance of allowing "defeasible" information into a meaning, which then calls for a paraconsistent logic. The seven-valued logic of Ginsburg (1986) is then alluded to for application, though it is never really explained fully. Since Kornai seems to see this book as a textbook, he might consider really discussing the system of logic he has settled on for applications. K. then lays out in just a few pages a variety of apparently novel suggestions for treating natural language constructions of various kinds using some blend of Montague grammar with a paraconsistent logic. While intriguing, this little precis of how to do semantics in a different way really deserves to be fleshed out into a big paper of some kind that is put forth to the community as a stand-alone publication. This new approach seems like a pretty big deal to me, but here it is kind of hiding in the middle of a purported textbook, without the benefit of a thorough presentation.

The final substantive section connects K's ideas about formal semantics with the sorts of syntactic theories he earlier called "grammatical," which rely on notions of case, valence, and dependency more than traditional syntactic structures. The proposed methodology is still presented as a variant of Montague grammar, but now he is putting forth another novel set of proposals. Do these complement the proposals of the previous section, or do they replace them? I was a bit confused about what K. is really advocating we should do at this point. The brevity of this section, together with the unfamiliarity of the methods, makes it seem almost telegraphically concise.

My feeling about this chapter is that it makes a great many new proposals, but it is too short as a result. We really deserve a whole book that expands Chapter 6, I think. I, for one, would gladly read it. One thing that struck me as a little surprising, too, was the glib way in which Kornai sidesteps the hyperintensionality problem that has been long known to afflict Montague-style intensional logic treatments of semantics. This is widely regarded as a very big deal, and several researchers have spent large portions of their time working on solutions. Witness the recent work of Carl Pollard to fix Montague's IL, as he presented in a course over several days at the European Summer School in Logic, Language and Information (Dublin 2007). There was a contemporaneous publication by Reinhard Muskens in the Journal of Symbolic Logic detailing a different approach to fixing intensional logic. Does Kornai feel these people are wasting time on a non-issue? Or perhaps he would welcome joining forces between his proposals and those just cited.

Tuesday, March 16, 2010

Kornai on syntax

Chapter 5 of "Mathematical Linguistics" discusses a variety of approaches to syntax. Such a wide variety has surely never been discussed in a mathematically sound way in the same piece, I'm sure. He begins by going through relatively familiar theories like phrase structure rules and categorial grammars. He then discusses dependency grammars and case-based approaches, which gets increasingly unfamiliar to me. He then flies off to places unknown, in a fantastically interesting discussion of weighted and probabilistic grammars. The treatment of the "density" of a language is especially new to me.

There a few trouble spots, which include poorly introduced notions that get relied on at certain points. There is a discussion of weighted grammars that hinges on a "Dyck language," which is not a creature I am aware of, and I couldn't see where it had been introduced earlier in the book. This has the effect of making an already dense treatment into something overly cryptic.

The final discussion concerns "external evidence" in syntax, which refers to properties of human languages that are more or less evident by nonlinguistic facts. For instance, it is pretty plain that natural languages are, in some sense, learnable. What is missing is, as Kornai points out, "a realistic theory of grammar induction." This is something dear to me, which I have devoted some years to working on. I invite readers to see my recent paper "Grammar induction by unification of type-logical lexicons," published online in the Journal of Logic, Language, and Information. This brings me to another slight criticism of Kornai, where he (probably without meaning to) appears to propagate a common misunderstanding about the import of Gold's inductive inference results. K. states that "context-free grammars cannot be learned this way" [by exposure to positive examples]. But Gold only showed this to be true if we attempt to learn a class of context-free grammars that generates every finite language and beyond. Results can be much more positive if the finite languages are excluded from the hypothesis space, for instance. In my paper I show how a restricted class including some context-sensitive grammars can still be learned from a finite sample set, which is an even stronger learning criterion than Gold's identifiability in the limit.

On the whole, I was fascinated by this chapter, it contains a wealth of things that have never been treated very well formally, and simply to have so many approaches collected for discussion is an immense contribution to our literature.

Monday, March 8, 2010

Kornai's Chapter 4

Chapter 4 of Mathematical Linguistics deals with morphology. At this point, the book really shows its value not just as a mathematical compendium, but as a survey of linguistics. The treatment of word structure is founded in the notion of the morpheme, which I have spent some energy crusading against. I was swayed by the theory known as Whole Word Morphology, due apparently to Alain Ford and Rajendra Singh. I learned it from a graduate student at the University of Chicago, and we published one short paper stemming from his work in the ACL Workshop on Morphological and Phonological Learning (2002). Oddly enough, according to databases like Google Scholar, this is my most highly cited piece of work.

In any event there a few other things to quarrel with in Kornai's survey of morphology. For one thing, he restates the old yarn that derivational affixes appear closer to the stem than inflectional ones, and this appears to be not the case in the Athabaskan languages such as Navajo. He also considers the parts of speech to have a morphological foundation, but since they need to be used in syntactic derivations of sentences, it seems to me that parts of speech should have a more sound syntactic and semantic foundation as some kind of word-usage classes.

The section discussing Zipf's law is very useful. In sum I know of no survey of morphology that is anything like this chapter, it is a very important piece of literature. I wish Kornai would consider a mathematical treatment of word-based approaches like Whole Word Morphology; this is something I have been planning to work on for a long time now.

Friday, March 5, 2010

Chapter 3 of Kornai's book

In chapter 3 of "Mathematical Linguistics" Kornai deals with phonology. It is very nice to see the mathematical theory of phonology laid out here, formalizing the autosegmental framework. I do have a few questions about this treatment, since it seems to rely heavily on the notion that there are phonemes and sound segments (consonants, vowels etc.) Autosegmental phonology is then formalized into a computational system using biautomata. It is emphasized that context-sensitive generating power is seemingly not needed for phonology, and that finite-state power is sufficient.

Kornai appears to state that we can solve the phonemicization problem for a language (meaning that the phonemic inventory is determined by the phonetic data). I thought Yuen-Ren Chao proved otherwise in 1934, and that this proof was formalized in Kracht 2003. For example, I fail to see how it is possible to prove that English affricates are actually single segments. Why aren't they sequences of two segments (a stop and a fricative)?

Another issue comes from speech perception research, where years of trying have failed to establish that people use consonants and vowels as perceptual units. Syllables appear to have much more traction in this regard. It is of course still desirable to transcribe syllables using segments, but this can be regarded as a convenient fiction, as was suggested already by Trager, I believe, in 1935. On the view just described, each syllable would then be formally treated as a sequence of distinctive features, with a certain ordering relation but without any timing slots.

Saturday, February 27, 2010

Kornai's Chapter 2

Chapter 2 of Kornai's "Mathematical Linguistics" presents the basic elements familiar to many of us, such as the Chomsky hierarchy of languages. But the presentation is quite different, and involves plenty of algebraic perspective on the matter, together with a few factoids I had never really considered.

The most interesting things I learned involve analogies between the language hierarchy and recursion-theoretic views of the numbers. In particular, "the Kleene theorem guarantees that regular languages have the same distinguished status among languages that the rationals in Q have among numbers in R" (i.e. the reals). Moreover, "context-free languages play the same role among languages that algebraic numbers play among the reals." As a reminder, algebraic numbers are those reals that can be roots of polynomials, while the other reals are called "transcendental," with \pi as the canonical example. (one day I am going to have to enable this blog for MathML) It turns out that the transcendental numbers are in a well-defined sense infinitely more numerous than the algebraic numbers, and I guess this fact carries over to the language hierarchy, so that languages beyond context-free are somehow infinitely more numerous? I'm not sure about this whole analogy, this is really something I just learned from the book.

Thursday, February 25, 2010

Mathematical Linguistics by Kornai

I only just realized that Andras Kornai published the book "Mathematical Linguistics" in 2008. I am currently reading it and it is sparking my mind in a number of areas. I highly recommend it, it is a broad view of linguistics from a mathematical perspective.

In this blog I would like to highlight each chapter in turn. We don't get great books to study very often in this field. The first chapter is introductory, but it makes clear the author's rather unique view of things. The most interesting idea put forth there is that mathematical linguistics, viewed as a mathematical theory, is astoundingly complex, and encompasses far more axioms than is typical of well-studied areas of mathematics. This brings Kornai to the analogy with physics, and the idea that mathematical linguistics lies in a "mesoscopic" regime between the microscopic (which can be fully specified and axiomatized) and the macroscopic (which would presumably by typified in mathematics by nonlinear systems that can be chaotic and are impossible to specify axiomatically).

Tuesday, February 23, 2010

Book review: Mathematics of Language

A while ago I reviewed "The Mathematics of Language" by Marcus Kracht.
This is posted in the open access eLanguage, under Book Notices.
Here is the link:
http://elanguage.net/blogs/booknotices/?p=19

I'm putting this on the blog because I believe that very few people actually interested in this book are regular readers of Language or eLanguage.

The review is very general, intended for general linguists, but still offers something that might be of interest to people here.

Thursday, February 18, 2010

Parts of speech -- what are they again?

Syntactic "parts of speech" have bothered me for many years. If you are mathematically minded and not a traditional linguist, they probably bother you, too. There was a nice piece by Shuly Wintner in the recent issue of Computational Linguistics, in which he noted the "theoretical bankruptcy" of this field. How does this relate to parts of speech? Because computational linguists generally don't have a clue when it comes to parts of speech. I mean seriously, have you ever examined the tagging manual for the CLAWS tagset? (Google this and you can). This tagset (whichever alternative you choose) is the most incredibly ad hoc extension of the basic parts of speech which have descended to us from the ancient Greek grammarians. For modern computational linguists to actually use this suggests that perhaps modern physicists should rely more on Aristotle than Einstein.

If the notion of a "part of speech" is to have any valid theoretical foundation, then it must surely be formalized as a kind of "word usage class." It then reasonably follows that a good system of word usage classes should be able to be induced from language data. My hypothesis is that this is what the human learner does. The methodological million dollar question then becomes, can we induce good parts of speech from straight text (string only data), or do we need syntactic structural information to get it right? My work in type-logical grammar, such as a recent paper published in JoLLI, uses structural information. This leads to the bootstrapping problem of where you get that, and what it means to have structures prior to having any parts of speech. Plus the algorithms for it are grotesque, intractable batch unifiers, and generally useless for practical work.

I am wondering how much can be achieved with inducing parts of speech from string-only data. I have a current Master's student who is trying, by modifying some code posted by Alex Clark for a similar task. There are numerous difficulties, including the fact that the only contexts available to determine a word usage in string-only data are the nearby words, and this invokes n-gram models. Hardly a theoretically wonderful approach to language. Another big problem is word ambiguity; the same word-form often has several parts of speech, and really is being used as a different word in each case. Clark's paper in the 2000 CoNLL proceedings tries to address this. Dan Klein's dissertation attacks the problem as well, but he appears to be evaluating against the old grammarian's system as a gold standard. This is a bit backward, to me. Does anyone know of work in this area that is getting anywhere? Part of speech induction seems like an area in which only a few researchers are dabbling, so there is not yet a clear methodology that has got everyone's attention.

In the end, the parts of speech problem is a huge embarrassment for linguistics. It really shows that we are still at the beginning stages of the field's development, if we are tripped up by something so fundamental.

Tuesday, February 16, 2010

Categorial grammar, in some form

Since 1997 or so I have been busy with type-logical grammar, as known from the books by Morrill (1994) or myself (2004), and various other sources such as Moortgat's fine paper in the Handbook of Logic and Language (1997). These systems became quite baroque and complicated with the whole "multimodal" agenda, amply studied by Moot in his PhD thesis (2002).

Competing with the type-logical approach, one finds the combinatory categorial grammar championed by Steedman in a number of sources such as his books of 1996 and 2001. Now there is also the system known as "pregroup grammars" introduced by Lambek in the now-defunct journal Grammars in 2001. He had complained that the multimodal type-logical grammars were too complicated.

Now the question is, which of these frameworks is the most lively? Certainly pregroup grammars have that new-car smell in their first decade, and are being studied. But I am greatly encouraged by type-logical projects such as Jaeger's book on anaphora in the Springer series Trends in Logic. Linguist friends who know of my type-logical affections have often asked me about linguistic problems like the handling of anaphora and ellipsis. I have always demurred, sweeping such matters under the rug in my desire to just work on what I wanted, but also confident that these things could be taken care of. I have reason to believe that type-logical grammars are still a very good framework worthy of continuing study. I think their mathematical connections are quite rich, and my mind is full of things that will wait for future posts.

Monday, February 15, 2010

Opening lines. . .

Welcome to my blog about things linguistic and mathematical, and most frequently both at once.

This is to be a venue mostly aimed at professional researchers in linguistics, computational linguistics, and cognitive science, but mathematicians may also find much of interest. I am always interested in how mathematics can be applied to the study of language, and that often relates to cognition and computation, so here we are.

I've thought about doing a blog for a few years, always saying to myself I might start one when I have more time. Well, I don't have any more time but I am perhaps learning to type faster, so I just decided to give it a shot.

Watch for random musings of mine on math and language and how they relate. I wanted a venue to air out my mind that was less formal than publications, but still could access the broader community. I eagerly hope for serious commentary.

Current linguistics blogs seem to be of three kinds; there are maybe a few serious ones about linguistics, there are some of the usual things that are aimed at laypeople interested in language, and there are a few kind of nonserious blogs by grad students that are not very research oriented. This is the first blog devoted to research topics in the mathematics of language, I think. If it is not, please let me know.

Next post, something important.