Thursday, May 28, 2015

Natural languages are SPARSE?

Here's a question for the community.  I was recently told by a referee, very succinctly, that "natural languages are sparse."  I believe he/she must have meant that natural languages are each members of the SPARSE complexity class of languages, defined here:

While this seems reasonable at a glance, I have been at a loss to find any publication that proves, states, or otherwise mentions this putative fact.  Does anybody know how to establish whether natural languages are really sparse?


  1. That is probably too strong. Say we have k proper names n1 to nk, then the set of sentences of the form
    n likes n and n likes n and .... repeated m times,
    is a set of sentences of length 4m -1 which contains k^2m examples.

  2. OK, this conjectured property of natural languages was clarified somewhat at the recent Mathematics of Languages workshop. The idea is that the finite slice L^n of natural language L has a cardinality which is a "slowly growing" function of n, so that the cardinality ratio over the slice \Sigma^n of *all* expressions decreases to zero as n goes to infinity. But again, this is only *conjectured* to be a property of natural languages, and there seem to be no results known in formal language theory which pertain to such a notion. We're going to work it soon, I hope.