Thursday, May 28, 2015

Natural languages are SPARSE?

Here's a question for the community.  I was recently told by a referee, very succinctly, that "natural languages are sparse."  I believe he/she must have meant that natural languages are each members of the SPARSE complexity class of languages, defined here:

While this seems reasonable at a glance, I have been at a loss to find any publication that proves, states, or otherwise mentions this putative fact.  Does anybody know how to establish whether natural languages are really sparse?

Friday, February 27, 2015

Mathematics of Language -- Call for Papers

Probably repetitive by now, but here is a repost of the official call.  I might even be in attendance.

14th Meeting on Mathematics of Language (MoL 2015)

July 25-26, 2015
University of Chicago, USA

# Third and Final Call for Papers

The Association for Mathematics of Language invites submissions to its
biennial conference, MoL, which will be held July 25-26, 2015, at the
University of Chicago.  The meeting takes place on the last weekend of
the Linguistic Summer Institute of the Linguistic Society of America.

## Description

MoL is devoted to the study of mathematical structures and methods
that are of importance to the description of language.  Contributions
to all areas of this field are welcome.  Specific topics within the
scope of the conference include, but are not limited to the following:

* formal and computational analysis of linguistic theories and frameworks
* learnability of formal grammars
* proof-theoretic, model-theoretic and type-theoretic methods in linguistics
* mathematical foundations of statistical approaches to language analysis
* formal models of language use and language change

## Invited Speakers

MoL will feature invited talks by the following distinguished

* David McAllester (Toyota Technological Institute at Chicago, USA)
* Ryo Yoshinaka (Kyoto University, Japan)

## Submissions

MoL invites the submission of papers on original, substantial,
completed, and unpublished research.  Each submission will be reviewed
by at least three members of the program committee.  Papers presented
at the conference will be included in the conference proceedings,
which will be published in the ACL Anthology

All submissions must follow the style set out in the conference style
files, which are available at the conference website.  They must not
exceed twelve (12) pages of technical content, but may have an
unlimited number of extra pages for references.

Simultaneous submission to other conferences is allowed, provided that
the authors indicate which other conferences the paper is submitted
to.  A paper is accepted on the condition that it will not be
presented at any other venue.

MoL uses the EasyChair conference management system for submission,
reviewing and preparation of proceedings:

Submissions must be uploaded to this system electronically in PDF
format no later than (end of day, Anywhere on Earth)

Friday, March 13, 2015

## Important Dates

* Paper submission deadline: March 13, 2015
* Notification of acceptance: April 24, 2015
* Camera-ready copy deadline: May 8, 2015
* Conference: July 25-26, 2015

## Program Committee

Marco Kuhlmann, Program Chair (Linköping University, Sweden)
Makoto Kanazawa, Program Co-Chair (National Institute of Informatics, Japan)
Greg Kobele, Local Chair (University of Chicago, USA)

Henrik Björklund (Umeå University, Sweden)
David Chiang (University of Notre Dame, USA)
Alexander Clark (King's College London, UK)
Shay Cohen (University of Edinburgh, UK)
Carlos Gómez-Rodríguez (University of A Coruña, Spain)
Jeffrey Heinz (University of Delaware, USA)
Gerhard Jäger (University of Tübingen, Germany)
Aravind Joshi (University of Pennsylvania, USA)
András Kornai (Hungarian Academy of Sciences, Hungary)
Giorgio Magri (CNRS, France)
Andreas Maletti (University of Stuttgart, Germany)
Jens Michaelis (Bielefeld University, Germany)
Gerald Penn (University of Toronto, Canada)
Carl Pollard (The Ohio State University, USA)
Jim Rogers (Earlham College, USA)
Mehrnoosh Sadrzadeh (Queen Mary University of London, UK)
Sylvain Salvati (INRIA, France)
Ed Stabler (University of California, Los Angeles, USA)
Mark Steedman (Edinburgh University, UK)
Anssi Yli-Jyrä (University of Helsinki, Finland)

## Contact

For inquiries about the scientific program of the conference, please

For inquiries about the local organization and all practical aspects
of the conference, please email

Modeling the changes of natural vowel systems

Back in 2013, I posted about my plans to write a computer simulation that could model the way vowel systems are learned, transmitted, and changed.  I am happy to report that the first phase of this project was completed, and we presented a poster at the AAAS meeting this month.  I was able to find an excellent student of computer science, Hannah Scott, who is now my coauthor on the project.  She programmed the entire simulation in Python, leveraging methods from the "imitation game" described by Luc Steels.

In our simulation, we begin with a vowel system, which is just a set of F1 and F2 values on a vowel chart. Then some number of agents we call "parents" says the vowels a number of times to each of the "children."  The children then try to repeat the vowels, and we provide feedback based on how accurate the repetition is.  All of this has random noise introduced, of course, and the main parameters are the children's precision and the tolerance of the feedback system.

We had originally conceived of the feedback as coming from the parents, but then colleagues pointed out that in reality, parents' feedback is not used by children learning the sounds of language.  The real feedback comes from the child's own brain, which has already learned and stored an excellent perceptual model of the correct vowels.  The child is perfectly capable of monitoring and critiquing his/her own production efforts.

The simulation is, as might be expected, very sensitive to the values of the precision and tolerance parameters.  When either of these is too large or too small, the vowel system goes haywire or is completely extinct in one or two generations.  The sweet spot for these parameters, so far, is around 1 Equivalent Rectangular Bandwidth (an auditory unit of frequency).

We obtained some interesting little results so far.  E.g. the simulation preserves the Spanish 5-vowel system perfectly for eternity, but with the same parameters applied to the English vowels there are changes, with vowels merging etc.  These changes are then mitigated by introducing vowel duration as an additional attribute.  In my opinion this behavior of the simulation is incredibly realistic for a first try.

And this indeed seems to be *the first* attempt to do this.  I could find nothing in literature that presented a simulation of sound change, whether for vowels or anything at all. There is, of course, the literature by Steels and Bart de Boer which models how vowel systems could have emerged in the first place, at the dawn of language, but that is really a different type of question.

Thursday, October 30, 2014

Recursion and the infinitude of language - another tempest in the teapot

The latest issue of the Journal of Logic, Language and Information contains a marvelous little paper by András Kornai, "Resolving the infinitude controversy." In it, Kornai meets the latest perplexing linguistic discovery head-on, and uses it to show that the generative linguistic fascination with recursion in grammar and the infinitude of language turns out to be just another misplaced fixation on a tempest in a teapot, along with so many other once-cherished notions and so-called problems in linguistics.

More than a generation of linguistics professors, myself included, have harped at our students about the importance of human language building an infinite edifice from finite materials. The hoopla more or less reached its zenith in 2002, when Hauser, Chomsky and Fitch made much of the human capacity for recursion in language--indeed they proposed that it was the main thing which separated us from the other apes.  But beyond the conventional wisdom, as usual, truly interesting things were being discovered.

Over the past ten or more years, Dan Everett, for one, slowly convinced the linguistics community that the Pirahã language of the Amazon in fact has no kind of recursive or iterative grammatical structures. But other less highly advertised cases had long been lurking, among them Dyirbal and Warlbiri. What is linguistic theory to make of finite human languages?  Why would there be any, when we long thought that "infinitude" was a necessary property of a possible human language?

Although some have suggested otherwise, I think by now we all must admit that these descriptions are indeed correct, that there are really finite human languages, and that this is within the scope of possibility. It's OK--Kornai shows that this is no cause for alarm. The argument rests on the important point that even in the infinite languages, such as English, there is a steadily vanishing probability of a sentence being produced as the length increases. This probability distribution over the length of sentences shows that 99.9% of everything English speakers actually say is communicated in relatively short simple sentences. On another note, whatever the infinitude of English is good for, it is not good for saying anything beyond what Pirahã speakers can say--their expressive power is the same, so long as a Pirahã speaker is allowed to use multiple sentences to say what might be said using one English sentence.  In this respect, the information capacity of real English is similar to that of Pirahã. In fact, a mathematical argument shows that a more complicated finite language could easily outstrip the information capacity of infinitary natural languages as they actually are spoken.

One conclusion to draw is that all the hub-bub about recursion and infinity in grammar being the essence of the human condition is seriously misguided. Indeed, the evolutionary pressure, whatever it is, that leads most human languages to have infinitary sentences is a bit of a mystery, since it appears to provide little discernible advantage.  Except perhaps for the inherent value of nifty tales like "This is the House that Jack Built."  For fun, here is its final sentence:

This is the horse and the hound and the horn
That belonged to the farmer sowing his corn
That kept the rooster that crowed in the morn
That woke the judge all shaven and shorn
That married the man all tattered and torn
That kissed the maiden all forlorn
That milked the cow with the crumpled horn
That tossed the dog that worried the cat
That chased the rat that ate the cheese
That lay in the house that Jack built.

Wednesday, September 3, 2014

AAAS meeting in San José

I hope you were aware that the American Association for the Advancement of Science has a section for "Linguistics and language sciences." The annual AAAS meeting is set for 12-16 of February in San José, California, and there promises to be some activity from Section Z.

I am currently the liaison from the Association for Symbolic Logic with this section, and I plan to post items from AAAS that may be of interest to the logic and mathematics of language community.
Future posts will focus on language and logic-related items published in AAAS journals, such as Science.

Friday, July 4, 2014

Jim Lambek

Word has traveled around the mathematical linguistics community that we recently lost one of our "godfathers," Joachim Lambek. I met Jim during the 2001 Logical Aspects of Computational Linguistics conference which took place at a seaside retreat outside Nantes. After the end of that meeting, a number of us, including Jim and myself, stayed in Nantes for a workshop on learning theory that was held at the university there. I had the pleasure of going to dinner with Jim and some other colleagues on one of the days, but unfortunately Jim tripped at his hotel and broke his wrist later that evening. I waited with him outside his hotel while a colleague of his tried to get him some medical attention. In spite of the pain in his wrist, Jim recalled that I had asked him about procuring a copy of his book with P. J. Scott, Introduction to Higher Order Categorical Logic, which had fallen out of print. He gave me one of his cards and asked me to write my contact information on it.  I was pretty astounded that he would bother to talk to me about that while nursing a broken wrist and waiting for an ambulance in a foreign country. A few weeks later when I was back at the University of Chicago, a copy of the book arrived in the mail. I'm even more grateful that he signed the inside front page.

Wednesday, June 4, 2014

Recursion in linguistics, ad nauseam

After reading the discussions about the supposed role of recursion in Chomskyan linguistics, both in journals (see previous post) and on Norbert Hornstein's blog, my first thought was that if I see another linguist arguing about recursion I'm going to throw up. And yet, after thinking it over, I now see fit to add my own little tid-bit to the mix. 

Lobina argues, if I may paraphrase, that the stated or implied reasons for recursion in Chomksy's formalisms are vacuous, because supporters say things like "recursion is needed for a grammar to generate an infinite language" and things like that.  Lobina correctly points out that this is not in fact true, so a lot of these stated reasons for recursion in linguistic theory turn out to be moot.

On thinking it over, I remembered that I myself had a need for recursion in past work. In my paper of 2010 (erratum published 2011), I demonstrated that a certain kind of recursion in the structural design of sentences was necessary to have a class of infinite (tree-structured) languages that is learnable from finite data.  Now on reflection in the context of all this recursion talk, I believe that this may actually capture something that was sort of meant by Chomsky et sequitur over the years.  Recursion in syntax is not needed to generate the infinite capacity of language; rather, the recursion is needed to provide learnability of the infinite from only finite data. This is, at last, a property of the recursive structures that cannot be replicated using iterative or other methods.