A while back I declared myself a "nominalist" when it comes to linguistic theory, as opposed to a realist.

Well, hold on a minute. It's not entirely clear what I meant, because it's not entirely clear what the terms mean. In the philosophy of mathematics, "nominalism" generally means the stance that "there's no such thing as an abstract thing." [E.g. the Oxford Handbook of Philosophy of Mathematics and Logic] So on this view, not even a number really "exists," whatever that means. I think it is a little extreme to claim that there are no abstract things in this world. I'm not sure why I think that, but I do.

In the philosophy of linguistics (one of the most starved for literature), "nominalism" would have to mean something else. But it's complicated to figure out exactly what. I'll take a stab by saying that in linguistics, nominalism is a stance that says "elements of linguistic theory are not necessarily literally elements of the cognitive structure of language, although they might be."

This is quite a bit weaker than mathematical nominalism, whatever it all means.

## Wednesday, October 27, 2010

## Thursday, October 14, 2010

### Topology of language spaces

Very little has been written, it seems, about the topology of language spaces. Fix a (finite) vocabulary V. Then the Kleene-star set V* represents the (countably infinite) set of all expressions over V. The language space P(V*) is the set of all sets of expressions, so it is the power set of V*, meaning it has the cardinality of the reals. It is well known that most of these languages are uncomputable (non recursively enumerable), but it would be useful to provide a good topology to "metrize" the language space somehow. For an application of this, it would be very interesting to provide a suitable metric of the "distance" between two languages in a language space.

Very little work has been published which attacks this problem. The definitive work appears to be the PhD dissertation by David Kephart (University of South Florida, 2005) "Topology, Morphisms, and Randomness in the Space of Formal Languages." This dissertation first studies the only prior work on the subject, which defined the Bodnarchuk metric as a distance between languages. This is, roughly, the shortest expression length at which differences between the two languages appear. The resulting metric space of languages is in essence the Cantor space. It is an unsatisfying metric for linguistic, or any other, purposes.

Let's think for a moment about the problem to be modeled topologically. Consider two dialects of English, call them AE and BE. Assuming they have the same vocabulary (we can agree, I hope, that vocabulary differences alone are mundane and easy to describe), any differences between them have to be "grammatical," meaning speakers of AE may assent to certain sentences which are rejected by BE speakers as ungrammatical. We might like to know, then, how "far apart" are these dialects grammatically? As soon as any major grammatical difference appears, given the recursive nature of natural languages, it becomes quite likely that any difference between potential construction patterns will give rise to an infinite number of sentences which distinguish AE from BE. But yet, if AE and BE still agree on the majority of construction patterns, they may appear at a random glance to still be very nearly the same. So then, there is no hope for a simplistic metric like the cardinality of the distinguishing set of sentences, because in any interesting case the two dialects will be distinguished by an infinite number of sentences.

With that, let me get back to Kephart's thesis. After analyzing what he calls the "Cantor distance" mentioned above as fairly worthless, he goes on to propose two new metrics, which are in fact pseudo-metrics, because they give the distance 0 between some pairs of distinct languages. The first of these is called the Besicovitch norm when applied to one language, or Besicovitch distance when applied to the symmetric difference between two languages. (The name stems from the originator of the metric, where it was first applied in a treatment of cellular automata). Its definition is complicated because it counts cardinality by sectioning. The Besicovitch norm (or size of a language, by this measure) is defined as the least upper bound on limits of ratios between the cardinality of sections (up to length k) of the language to the cardinality of up to k-length sections of V*. To get a distance between AE and BE, simply use this norm to measure the size of the symmetric difference set between the languages. This quantity is a surjective mapping from P(V*) onto [0, 1], so all distances are a positive fraction at most 1.

The resulting pseudo-metric language space is not very nice topologically because it isn't even T0, but this can be improved by taking the quotient space which identifies all the "equivalent" distinct languages. This quotient space is called the Besicovitch topology. In it, all the finite languages have size 0, and so cannot contribute anything to the distance between two languages. This alone is a positive feature, in my view; for one thing, it implies that two languages which differ by only finitely many sentences are essentially the same. But the Besicovitch distance doesn't quite capture an intuitive feeling of the distance between languages for a number of reasons.

Kephart then proposes a refinement, a differently defined "entropic" pseudo-metric. It is this which I feel is getting close to a linguistically relevant language distance. As he summarizes, "where the Cantor metric hinges on the first apparent distinction between languages and disregards the rest, and the Besicovitch distance relies upon the total proportion of the monoid (V*) exhausted by language distinctions regardless of the expansion rate of this proportion, the entropic distance will take into account both the appearance of distinctions and their rate of increase." Instead of computing a proportion from the size of the up to k-length section of a language (or symmetric difference between languages), the entropic norm of a language computes a normalized topological entropy of the precisely k-length section. This is relevant because the topological entropy of each k-length section measures the exponential growth rate at that point. So, the entropic norm is defined as the "lim sup" (least upper bound on the limits) of this normalized entropy as k increases. The entropic distance between languages AE and BE is once again simply the entropic norm of their symmetric difference set.

I will need to do some further research to see how much I like this measure of the distance between languages, and the resulting topological language space. It certainly seems promising to me at the moment, and a further analysis could provide for a great deal of work for me or whoever was interested.

Very little work has been published which attacks this problem. The definitive work appears to be the PhD dissertation by David Kephart (University of South Florida, 2005) "Topology, Morphisms, and Randomness in the Space of Formal Languages." This dissertation first studies the only prior work on the subject, which defined the Bodnarchuk metric as a distance between languages. This is, roughly, the shortest expression length at which differences between the two languages appear. The resulting metric space of languages is in essence the Cantor space. It is an unsatisfying metric for linguistic, or any other, purposes.

Let's think for a moment about the problem to be modeled topologically. Consider two dialects of English, call them AE and BE. Assuming they have the same vocabulary (we can agree, I hope, that vocabulary differences alone are mundane and easy to describe), any differences between them have to be "grammatical," meaning speakers of AE may assent to certain sentences which are rejected by BE speakers as ungrammatical. We might like to know, then, how "far apart" are these dialects grammatically? As soon as any major grammatical difference appears, given the recursive nature of natural languages, it becomes quite likely that any difference between potential construction patterns will give rise to an infinite number of sentences which distinguish AE from BE. But yet, if AE and BE still agree on the majority of construction patterns, they may appear at a random glance to still be very nearly the same. So then, there is no hope for a simplistic metric like the cardinality of the distinguishing set of sentences, because in any interesting case the two dialects will be distinguished by an infinite number of sentences.

With that, let me get back to Kephart's thesis. After analyzing what he calls the "Cantor distance" mentioned above as fairly worthless, he goes on to propose two new metrics, which are in fact pseudo-metrics, because they give the distance 0 between some pairs of distinct languages. The first of these is called the Besicovitch norm when applied to one language, or Besicovitch distance when applied to the symmetric difference between two languages. (The name stems from the originator of the metric, where it was first applied in a treatment of cellular automata). Its definition is complicated because it counts cardinality by sectioning. The Besicovitch norm (or size of a language, by this measure) is defined as the least upper bound on limits of ratios between the cardinality of sections (up to length k) of the language to the cardinality of up to k-length sections of V*. To get a distance between AE and BE, simply use this norm to measure the size of the symmetric difference set between the languages. This quantity is a surjective mapping from P(V*) onto [0, 1], so all distances are a positive fraction at most 1.

The resulting pseudo-metric language space is not very nice topologically because it isn't even T0, but this can be improved by taking the quotient space which identifies all the "equivalent" distinct languages. This quotient space is called the Besicovitch topology. In it, all the finite languages have size 0, and so cannot contribute anything to the distance between two languages. This alone is a positive feature, in my view; for one thing, it implies that two languages which differ by only finitely many sentences are essentially the same. But the Besicovitch distance doesn't quite capture an intuitive feeling of the distance between languages for a number of reasons.

Kephart then proposes a refinement, a differently defined "entropic" pseudo-metric. It is this which I feel is getting close to a linguistically relevant language distance. As he summarizes, "where the Cantor metric hinges on the first apparent distinction between languages and disregards the rest, and the Besicovitch distance relies upon the total proportion of the monoid (V*) exhausted by language distinctions regardless of the expansion rate of this proportion, the entropic distance will take into account both the appearance of distinctions and their rate of increase." Instead of computing a proportion from the size of the up to k-length section of a language (or symmetric difference between languages), the entropic norm of a language computes a normalized topological entropy of the precisely k-length section. This is relevant because the topological entropy of each k-length section measures the exponential growth rate at that point. So, the entropic norm is defined as the "lim sup" (least upper bound on the limits) of this normalized entropy as k increases. The entropic distance between languages AE and BE is once again simply the entropic norm of their symmetric difference set.

I will need to do some further research to see how much I like this measure of the distance between languages, and the resulting topological language space. It certainly seems promising to me at the moment, and a further analysis could provide for a great deal of work for me or whoever was interested.

## Wednesday, October 6, 2010

### The truth? It's complicated

An interesting new paper by Kentaro Fujimoto in the Bulletin of Symbolic Logic 16(3) discusses logical "theories of truth". For this precis, I will liberally plagiarize Fujimoto's paper, I doubt he'll mind. Start with B, a basic underlying logic which is a first-order classical system of arithmetic such as Peano arithmetic. Then try to introduce a truth predicate T, so that T(x) means "x is true" for a proposition x. This will get you a theory of logical truth, but it turns out there are a million or so ways of doing it, and none of them are currently accepted as terribly good.

Disquotational theories of truth.

Truth is axiomatized by Tarskian T-biconditionals, which are statements of the form:

T("P") is true iff P. If we allow P to range over sentences that may contain the truth predicate, the resulting theory becomes inconsistent due to the Liar paradox. Nice!

So, a disquotational approach must impose a restriction on the class of sentences available for P in the biconditionals. The obvious restriction is to just not allow the truth predicate in P at all. Problem is, the resulting logic TB has lost the ability to deduce anything about truth. It is a theorem that TB is proof-theoretically equivalent to the basic underlying logic without any truth statements.

Hierarchical theories

Tarski provided the most widely known truth theory, the original hierarchical theory. His definition entails the T-biconditionals of the above form for the target language; however, the definition cannot be carried out within the target language. This is Tarski's theorem on the "undefinability of truth" in any one language. To incorporate this into a working theory of truth, he proposed a hierarchy of languages in which truth in one language could be defined in another language at a higher level.

Iterative compositional theories

As described in Feferman 1991 [J. Symb. Logic 56:1-49] "truth or falsity is grounded in atomic facts from the base language, i.e. can be determined from such facts by evaluation according to the rules of truth for the connectives and quantifiers, and where statements of the form T("A") are evaluated to be true (false) only when A itself has already been verified (falsified)." Such an approach is iterative in that a truth statement T("A") is true only if A is true, and it is compositional because a compositional sentence is determined only by its components according to the logical evaluation rules for connectives and quantifiers.

Feferman's Determinate truth theory

In this approach we limit the truth predicate, so there are two predicates T, and D its domain of significance. Feferman wants D to consist of just the meaningful and determinate sentences, ones which are true or false, in other words. Feferman further requires that D is "strongly compositional," so that a compound sentence is D iff all the substitution instances of its subformulae by meaningful terms belong to D.

The last two kinds of theories have the advantage of being "type-free," since they allow the proof of of the truth of sentences which contain the truth predicate. But, they have consistency problems remaining. Fujimoto's paper jumps off from here, and proposes a new improved truth theory which is also type-free, and then shows how to compare disparate truth theories using the new notion of "relative truth-definability."

It's heady stuff. I had only read Tarski's famous truth theory, and hadn't realized there were all these other proposals and complicated issues.

I think all this relates to linguistics, because let's face it, how can we have a theory of semantics without a theory of truth? How do people decide what's true? It's kind of mind-boggling.

Disquotational theories of truth.

Truth is axiomatized by Tarskian T-biconditionals, which are statements of the form:

T("P") is true iff P. If we allow P to range over sentences that may contain the truth predicate, the resulting theory becomes inconsistent due to the Liar paradox. Nice!

So, a disquotational approach must impose a restriction on the class of sentences available for P in the biconditionals. The obvious restriction is to just not allow the truth predicate in P at all. Problem is, the resulting logic TB has lost the ability to deduce anything about truth. It is a theorem that TB is proof-theoretically equivalent to the basic underlying logic without any truth statements.

Hierarchical theories

Tarski provided the most widely known truth theory, the original hierarchical theory. His definition entails the T-biconditionals of the above form for the target language; however, the definition cannot be carried out within the target language. This is Tarski's theorem on the "undefinability of truth" in any one language. To incorporate this into a working theory of truth, he proposed a hierarchy of languages in which truth in one language could be defined in another language at a higher level.

Iterative compositional theories

As described in Feferman 1991 [J. Symb. Logic 56:1-49] "truth or falsity is grounded in atomic facts from the base language, i.e. can be determined from such facts by evaluation according to the rules of truth for the connectives and quantifiers, and where statements of the form T("A") are evaluated to be true (false) only when A itself has already been verified (falsified)." Such an approach is iterative in that a truth statement T("A") is true only if A is true, and it is compositional because a compositional sentence is determined only by its components according to the logical evaluation rules for connectives and quantifiers.

Feferman's Determinate truth theory

In this approach we limit the truth predicate, so there are two predicates T, and D its domain of significance. Feferman wants D to consist of just the meaningful and determinate sentences, ones which are true or false, in other words. Feferman further requires that D is "strongly compositional," so that a compound sentence is D iff all the substitution instances of its subformulae by meaningful terms belong to D.

The last two kinds of theories have the advantage of being "type-free," since they allow the proof of of the truth of sentences which contain the truth predicate. But, they have consistency problems remaining. Fujimoto's paper jumps off from here, and proposes a new improved truth theory which is also type-free, and then shows how to compare disparate truth theories using the new notion of "relative truth-definability."

It's heady stuff. I had only read Tarski's famous truth theory, and hadn't realized there were all these other proposals and complicated issues.

I think all this relates to linguistics, because let's face it, how can we have a theory of semantics without a theory of truth? How do people decide what's true? It's kind of mind-boggling.

Subscribe to:
Posts (Atom)