Keywords In and Out of Context

some more thoughts and theories about keywords


The Lure of Associates

While considerable research attention is being devoted to how large language models are pretrained through exposure to vast textual networks to analyze patterns in order to predict word sequence (see this bibliometric overview in the current issue of Information), today there is comparatively little attention being paid to how children learn language (which is, of course, the heart of Stevan Harnad’s emphasis on grounding human cognition in sensorimotor experience, as I noted in a previous post).

Language is clearly the dominant marker of human cognition, though recent findings of possible symbolic gesturing by chimpanzees as a hypothetical alternative route to the emergence of semanticity (notably advocated by Michael Arbib some time ago) may be of interest, as these too represent sensorimotor experience unavailable to LLMs. Money, however, also talks, and it obviously values research on ChatGPT far more than research on linguistic issues in children.

Nevertheless, one of the leading theoreticians in linguistic production by humans is Thomas Hills of the University of Warwick, who (with his many associates, as reflected in his lengthy list of co-authors) studies the development of lexicons across the lifespan. Since children, unlike ChatGPT and its rivals, do not have immediate access to vast corpori of textual resources nor the processing capabilities of artificial intelligence’s neural networks (beyond their own growing brains), the process of learning language is relatively slow and specific to the child’s individual environment.

Studies have shown that children normally recognize about 50 words at age one, approximately 1,000 words by age three, and at least 10,000 words by age five. Importantly, although the first words in a child’s lexicon are almost invariably concrete ones relating to objects and locations in the immediate environment, abstract words are soon acquired, especially affective ones expressive of a child’s emotions. Words are not acquired in isolation; the influences of child-directed speech and caregiver vocabularies are both important. Linguistic complexity as shown by the development of grammatical correctness also accompanies this process. Accordingly, observational studies of children learning and using words have been a staple of research for decades.

Hills’ approach has been different, using network analytic techniques to study the growth of the mental lexicon in human beings. His early research involved using the actual lexicons of children at various ages to analyze the success of three different generative network growth algorithms in explaining why and when specific words in these lexicons were acquired.

These three algorithms include (1) the lure of the associates, which predicts that new words that sound similar to several known words are learned better than new words that sound similar to few or no known words; (2) the preferential attachment network growth algorithm, which predicts that a novel word that is similar to a known word that itself sounds similar to many other known words will be better learned than a novel word that is similar to a known word that itself sounds similar to few other known words; and (3) the preferential acquisition network growth algorithm, which predicts that new words that sound similar to several new words in the learning environment are better learned than new words that sound similar to few new words in the learning environment.

Hills’ initial phonological research indicated that the preferential attachment and the lure of the associates algorithms were more successful in explaining lexicon growth, though his corpus sample was very limited. His subsequent research explored the contextual diversity of relations among words in the language-learning environment of young children, both in relation to adult association norms and in relation to the normative age of acquisition of those words. Results indicated that words reflecting topics of interest to the child were more easily acquired, likely making the development of an adult lexicon a lengthy process as the child matures.

Most recently, Hills has used network techniques to analyze the lexicon across the lifespan, analyzing data from Dutch, English, and Spanish speakers from the various age cohorts. The lexical networks, as represented by free association data, of the older cohorts in all three languages became sparser, showing less clustering and connectivity. However, results for the individual-level analyses show that some of these results do not reflect a property of individuals, but rather of age cohorts.

He observes, “The results of this study offer a narrative for understanding the process of language acquisition across the lifespan. In earlier life, a person first learns hub words that bind a network together across diverse contexts. However, a word acquired in earlier life may have a relatively lower clustering coefficient as it is not clustered with words with specialized meaning, but is a general word that applies across a broad range of topics. In later life, a person may then gradually acquire more peripheral language that accumulates around more specialized topics, in regions with nodes of higher clustering coefficients. However, because this knowledge is more specialized, it is likely to be of lower degree. This phenomenon may indicate not cognitive deterioration but rather the
consequence of lifelong learning, with individuals becoming increasingly differentiated from one another via their growing specialized knowledge.
Our results suggest that this process occurs throughout the adult lifespan.”

What is particularly interesting to me in all this is how different words may become “key” throughout an individual’s lifespan, ranging from common “hub” words like “apple” and “mother” in childhood to specialized terms like “externality” and “seigniorage” in adulthood, for instance. (I wonder also whether some of these generative network algorithms might help to explain the more or less successful learning of various specialized terms in higher and professional education, though I have to confess that I’m not familiar with the literature on that. My uneducated guess would be that Medical English would most likely be the field in which this type of research would be useful, given that it has become a lingua franca for the global medical community as well as the complexity of the terminologies involved).

Another take-away is that simply looking at network diagrams of sample lexicons of older adults elicited through free association and, unlike Hills, calling these “deteriorated” in comparison to those of younger people does a disservice to the complexity of human cognition. It is not the general vocabulary to which one has immediate access (and ChatGPT already outperforms humans in that) but what one can do with the relevant (key) words in one’s lexicon that matters.

As an example, here’s Robert Jay Lifton’s recent piece in Scientific American about the critical importance of truth and its present endangerment in the United States. I should point out that Dr. Lifton happens to be 98 years old, and his command of language is superior to that of both ChatGPT and people many decades younger. Hopefully, the lure of “truth” and its associates (“trust,” etc.) will prevail over its antonyms in America today.