Here's a question for the community. I was recently told by a referee, very succinctly, that "natural languages are sparse." I believe he/she must have meant that natural languages are each members of the SPARSE complexity class of languages, defined here:

http://en.wikipedia.org/wiki/Sparse_language

While this seems reasonable at a glance, I have been at a loss to find any publication that proves, states, or otherwise mentions this putative fact. Does anybody know how to establish whether natural languages are really sparse?

## Thursday, May 28, 2015

Subscribe to:
Post Comments (Atom)

That is probably too strong. Say we have k proper names n1 to nk, then the set of sentences of the form

ReplyDeleten likes n and n likes n and .... repeated m times,

is a set of sentences of length 4m -1 which contains k^2m examples.

OK, this conjectured property of natural languages was clarified somewhat at the recent Mathematics of Languages workshop. The idea is that the finite slice L^n of natural language L has a cardinality which is a "slowly growing" function of n, so that the cardinality ratio over the slice \Sigma^n of *all* expressions decreases to zero as n goes to infinity. But again, this is only *conjectured* to be a property of natural languages, and there seem to be no results known in formal language theory which pertain to such a notion. We're going to work it soon, I hope.

ReplyDelete