Bob McMurray: Defusing the Childhood Vocabulary Explosion
Bob McMurray will be visiting next Tuesday, and I look forward to meeting with him. Am reading his latest paper on Sciense: Defusing the Childhood Vocabulary Explosion — McMurray 317 (5838): 631 — Science. Bob also has a webpage dedicated to the paper.
The paper itself is short, 1 page long. The model is very simple (unless I missed something). He started by assuming that words vary in difficulty of acquisition, which is a function of frequency of occurence. Although other factors such as phonology are mentioned, I think frequency was what in his mind, because he also assumed that a word is acquired after n exposures, where n is a (linear?) function of its "difficulty". Given this set up, children’s word learning curve is determined by the shape of the distribution of "word difficulty", which Bob convinently assumed to be Gaussian. He also suggests that any similar-looking distribution function will do.
Seems to me the model can be further simplified. The core assumption here is that the # of exposures required to learn a word is distributed in an accelerating function. Then assuming a linear accumulation of exposures to these words, the number of words "acquired" during each time interval will be an increasing function. The "# words know" plot reflects an intergal of the former, which will show an "explosion"-looking, exponential-style funciton.
So the key is the "difficulty" distribution. But I do not understand why it should be Gaussian, where hard words are few. I would choose an ever increasing function, because there are potentially millions of words (see the Google Web 1T database), most of which are hard.
Questions:
- The model seems to assume a uniform distribution of frequency of occurence, i.e., a linear acumulation of exposure. That doesn’t sound right. We know from Zipf’s law that language is replete with the most frequent words, and some words you will never see or hear in your lifetime (unless you sift through Google Web 1T as I did). If this is added to the mix, I bet it will slow down the curve. Maybe the slow down will not be severe, because we assume the first words are more or less frequent. But still it seems to be a factor to consider.
- I also entertained the possibility to turn the model around: to drop the "difficulty of acquisition" concept and replace it with simply frequency of occurence. That won’t work, because it predicts that the order of acquisition is strictly based on frequency, which we know is not the case. So the "difficulty" factor is a smart move.
But I am surprised to see that one of the "real-world" analyses indeed equated frequency with difficulty:

- One premise of the model — a big implicit assumption — is that the learner has unlimited capacity in keeping distinct representations of the vocabulary. In other words, adding a new word will degrade existing entries in the mental lexicon. That is perhaps the biggest puzzle for any connectionist modelers of vocabulary learning. Symbolic models that assume unlimited memory capacity (list-like memory) are frowned upon nowadays because they are not neurologically constrained, I guess.
Bob touched on this with experimentations of various scoring schemes that simulates positive or negative neighborhood-type effects. The simulaitons, however, are not motivated by the memory literature. I am not sure if I am convinced by these additive effects.
- One thing I am still trying to figure out is what he meant by "training the model". It seems to be everything is symbolic and one can get a non-numeric solution by hand. Isn’t everything determined by the Gaussian-like distribution?
September 24th, 2007 at 4:56 am e
I would like to follow your discussion. Please let me know. Thanks.
Hans Drumbl