Saturday, February 27, 2010

Kornai's Chapter 2

Chapter 2 of Kornai's "Mathematical Linguistics" presents the basic elements familiar to many of us, such as the Chomsky hierarchy of languages. But the presentation is quite different, and involves plenty of algebraic perspective on the matter, together with a few factoids I had never really considered.

The most interesting things I learned involve analogies between the language hierarchy and recursion-theoretic views of the numbers. In particular, "the Kleene theorem guarantees that regular languages have the same distinguished status among languages that the rationals in Q have among numbers in R" (i.e. the reals). Moreover, "context-free languages play the same role among languages that algebraic numbers play among the reals." As a reminder, algebraic numbers are those reals that can be roots of polynomials, while the other reals are called "transcendental," with \pi as the canonical example. (one day I am going to have to enable this blog for MathML) It turns out that the transcendental numbers are in a well-defined sense infinitely more numerous than the algebraic numbers, and I guess this fact carries over to the language hierarchy, so that languages beyond context-free are somehow infinitely more numerous? I'm not sure about this whole analogy, this is really something I just learned from the book.

1 comment:

  1. (I'm not sure if this is in the book, but) to answer the question of cardinality, recursive languages correspond to "constructive" real numbers, including those transcendentals like \pi which actually have an algorithm that can compute any number of digits. There are only denumerably many of these, while the cardinality of all real numbers is continuum. Indeed the same diagonalization trick that shows that we can't enumerate the real numbers will sow that we can't enumerate the set of all languages. We may not _care_ about non-recursive languages, in fact we are likely not to care about anything beyond (mildly) context sensitive, but those other languages are `out there' and they hugely outnumber the ones we care about.