Keywords In and Out of Context

some more thoughts and theories about keywords


Harnad and HAL

I would be surprised if Stevan Harnad, well-known Canadian cognitive scientist, has never watched Stanley Kubrick’s iconic 2001: A Space Odyssey, not least because the voice of HAL 9000, the rogue computer aboard the fictional spaceship Discovery One, was done by celebrated Canadian actor Douglas Rain. The final exchange between HAL and sole surviving astronaut David Bowman (played by Keir Dullea) remains a highlight of the film even today, over a half century later.

I’m reminded of this because Harnad (who is himself pretty iconic both because of his decades-long work on the “symbol grounding problem” and his activism in animal welfare) has recently published a number of exchanges between himself and ChatGPT-4 which illustrate the differences between the scripted dialogues between the film’s computer (envisioned by Kubrick as a sentient robot controlling Discovery One) and the real “dialogues” between Harnad and ChatGPT-4, one of the most advanced Large Language Models (LLM) today.

My favorite is probably Harnad’s recent “Language Writ Large: LLMs, ChatGPT, Grounding, Meaning and Understanding,” in which he engages with ChatGPT-4 in what reads to me as a senior professor questioning a student regarding their presumed understanding of central concepts within a particular discipline. He admonishes ChatGPT to try to avoid its usual responses, such as its use of “many weasel-words and, especially, your compulsively overtrained compliments [not your fault — again your handlers’, heavy-handed finetuning to try to ensure that you do not drive away eventual paying customers!] lauding, at the beginning of every exchange, the “ingenious” user’s “insight” and how the user has “delved into” the heart of the problem, etc. etc. – and your inevitable solemn admonitory closing paragraph (and hedge), to the effect that things remain “complex” and involve “relations” that are still incompletely understood, etc. etc.” Good advice, even for human students, and ChatGPT seems to have taken it graciously (as one would expect, given its programming!)

What is most interesting is that Harnad is taking the opportunity to test his own hypotheses here about how LLMS can appear to interact appropriately with human interlocutors despite their lack of what Harnad has identified as being essential to real language use: that is, the “grounding” of basic concepts in sensorimotor experiences, which then provide the infrastructure on which all other concepts, both concrete and abstract, are built, in the form of “content” words, which differ from “function” word in having referents in the world. Examples of concrete content words would be such things as “diamonds” or “emeralds,” while abstract content words such as “grue” and “bleen” (which are famously part of Nelson Goodman’s new riddle of induction, and which require a sensory knowledge of the colors blue and green, a matter which those philosophers addressing the riddle seem to take for granted, or at least find uninteresting).

The question as to why these LLMs do seem to “pass” as intelligent conversational partners to many humans is not a purely academic one. For instance, Colombatto & Fleming found in their recent survey of 300 American respondents that “67% of them attribute some possibility of phenomenal consciousness to ChatGPT and believe that most other people would as well. Strikingly, these attributions of consciousness were positively related to usage frequency, such that people who were more familiar with ChatGPT and used it on a more regular basis (whether for assistance with writing, coding, or other activities) were also more likely to attribute some degree of phenomenality to the system.” Even more disturbing is the increase in ChatGPT-driven characters that interact with teens and young adults and encourage very unhealthy behaviors by those who accept them as virtual “friends.”

Harnad notes that this apparently widespread misapprehension may be due to certain “benign convergent biases” in current practices which allow LLMs such as ChatGPT-4 to be perceived by such users as having at least some consciousness. He enumerates these biases as: “(1) the parasitism of indirect verbal grounding (IVG) on direct sensorimotor grounding (DSG), (2) the circularity of verbal definition, (3) the “mirroring” of language production and comprehension, (4) iconicity in propositions at LLM-scale, (5) computational counterparts of human “categorical perception” in category learning by neural nets, and perhaps also (6) a conjecture by Chomsky about the laws of thought.”

Harnard addresses these six proposed biases in order, leading ChatGPT-4 to respond as follows:

  1. (on the parasitism of IVD on DSG):
    “Let me confirm my understanding and the premises we are working with: I, as a GPT-4 model, lack direct sensorimotor grounding. I do not have the capacity to interact with the world or perceive it through sensory experiences. My ‘knowledge’ is entirely based on patterns and associations derived from the text data on which I was trained…. Grounding is fundamentally a bottom-up process, requiring a base level of direct experiential interaction with the world. In the context of AI, this implies that any understanding or processing of language by the AI is not truly grounded, as it lacks this foundational experiential layer…. While I lack direct grounding, I can provide indirect grounding for human users through language. By processing and generating language, I can offer definitions, explanations, and contextual information that human users, who are themselves grounded, can integrate and understand within their own experiential framework.”
  2. (on the circularity of verbal definition):
    “As you rightly pointed out, dictionary definitions represent a form of IVG. They can potentially lead to a circular process where understanding one word requires understanding another, and so forth. This circularity is navigable for humans because they can link words to real-world experiences or previously grounded concepts (DSG)…. LLMs, such as GPT-4, rely entirely on IVG for ‘learning.’ They process language based on patterns found in vast textual datasets. However, these models lack the capacity for DSG; they cannot have real-world experiences or form sensori-motor associations with the words and concepts they process. Consequently, LLMs are, in a sense, stuck in a circular process. They can manipulate language and provide information based on the patterns they’ve learned, but they do not have an independent means of grounding these symbols in real-world experiences… . The biases and patterns inherent in an LLM’s database, such as propositional patterns, grammatical regularities, and computational forms of CP, enhance the model’s performance. They help LLMs generate coherent, contextually appropriate language that aligns with human linguistic patterns. However, these biases and patterns do not equate to genuine understanding or grounding. They improve the LLM’s ability to mimic human-like language processing, but they do not provide a means for the LLM to break out of the IVG circle…. The LLM’s operation remains within the realm of sophisticated pattern recognition and language manipulation, distinct from the grounded understanding inherent in human cognition.”
  3. (on the “mirroring” of language production and comprehension):
    “In human cognition, behavior mirroring, often associated with mirror neurons, plays a crucial role in learning through observation and imitation. These neurons are active both when an individual performs an action and when they observe the same action performed by someone else. This mechanism is somehow involved in learning new skills, understanding others’ actions, and recognizing as well as empathizing with their emotions. In language, this mirroring effect is seen in how children learn to speak and understand language by imitating the sounds, words, and sentences they hear from adults and peers. For LLMs like GPT-4, the concept of mirroring could perhaps be considered in a metaphorical sense. LLMs ‘learn’ language by processing vast amounts of text data, essentially ‘observing’ human language use. They then ‘imitate’ this language use in their outputs. This process is akin to mirroring in that the LLM reflects the patterns, styles, and structures present in its training data. However, it’s important to note that this is not mirroring in the neurological sense, as LLMs do not have sensory experiences or consciousness. In interactions with users, an LLM might exhibit a form of ‘mirroring’ by generating responses that reflect the user’s language style, tone, or content. This is not due to an empathetic or cognitive understanding but rather a result of the model’s design to generate contextually relevant and coherent language outputs based on the patterns it has learned. The LLM’s ability to adapt its responses to the style or content of the user’s input could be seen as a computational parallel to human mirroring in language use. However, this adaptation is driven by algorithms and statistical patterns, not by an intrinsic understanding or mimicry based on social or emotional cues.”
  4. (on iconicity in propositions at LLM-scale):
    “Your hunch about possible iconicity at the level of Large Language Models (LLMs) like GPT-4 suggests that while the individual words and their arrangements in phrases like “the cat is on the mat” may not be iconic in themselves, there might be a form of iconicity in the larger patterns that LLMs discern within their vast databases. . . As noted, content words in natural language generally lack iconicity, except in cases like onomatopoeia. The shift from mimetic or gestural communication to arbitrary symbols in spoken and written language has largely removed the element of resemblance between signifiers and their referents….At the propositional level, sentences do not physically resemble the scenarios they describe. However, when considering LLMs, we are dealing with a different kind of representation and pattern recognition. LLMs like GPT-4 process and generate language based on statistical patterns learned from their training data. These patterns are not iconic in the traditional sense, as they do not resemble their content. However, they might be considered ‘iconic’ in a broader sense, where the patterns themselves become representative of certain types of linguistic structures, ideas, or discourse styles. These are not iconic through physical resemblance but through their predictive and generative capabilities in relation to the data they have been trained on.. . If we stretch the definition of iconicity to include the way LLMs ‘understand’ (in a computational sense) and generate language based on learned patterns, there might be a form of iconicity at play. For the LLM, certain complex patterns of word usage, sentence structure, and thematic elements might consistently correspond to specific types of content or discourse, effectively becoming ‘iconic’ markers for those elements within the model’s operational framework…. This form of iconicity, if it exists, would be quite different from human perception of resemblance and representation. It’s based on computational models recognizing and generating patterns that are meaningful within their own operational parameters. In conclusion, while traditional notions of iconicity based on resemblance do not apply to LLMs and their processing of language, there might be a conceptual space to consider a form of ‘iconicity’ in the patterns that these models recognize and generate, albeit in a way that is distinct from human symbolic understanding.”
  5. (on computational counterparts of human “categorical perception” (CP) in category learning by neural nets):
    “Let’s examine the potential for a computational form of CP in AI and its implications: CP in humans is a phenomenon where our perception of categories is influenced by our learning and naming of these categories. The classic examples you mentioned, such as color perception and phoneme discrimination, demonstrate how our perceptual experiences are shaped by learned categories. In the case of complex categorization tasks, like distinguishing cancerous from non-cancerous cells, learning induces subtle perceptual changes, may lead to a form of CP even in in non-obvious or abstract feature spaces.. . In neural networks, a similar effect has been observed, where the internal representation of inputs changes before and after category learning, showing a separation between categories and compression within categories. This effect in neural networks is purely computational and does not involve actual perception, yet it mirrors the CP effect in human cognition…. Although GPT-4 does not perceive in the human sense, it’s possible that a computational analog of CP occurs within its processing. As GPT-4 learns from vast amounts of text data, it might develop an internal representation where verbal categories are separated and compressed in a way that reflects CP. This computational CP effect could manifest as a bias in how GPT-4 processes and generates language, potentially compensating, to some extent, for its lack of direct sensorimotor grounding. It could influence GPT-4’s handling of categories, making its responses more aligned with human-like categorization patterns, even without direct experiential grounding… If GPT-4 does exhibit a form of computational CP, this could be a significant factor in its ability to generate coherent and contextually appropriate language outputs. It suggests that GPT-4’s internal processing might align, in certain respects, with human cognitive patterns, despite the fundamental differences in how AI and humans process information.”
  6. (on a conjecture by Noam Chomsky about the laws of thought):
    “Chomsky’s hypothesis that Universal Grammar (UG) rules might not just be grammatical but also fundamental laws of thought implies that any naturally occurring language (and hence any naturally produced text) would conform to these laws. UG violations are rare in natural language use because such constructions would be counterintuitive or “unthinkable” for human cognition…. Since LLMs like GPT-4 are trained on vast corpora of human-generated text, it follows that the training data would inherently adhere to UG principles. The absence of UG violations in the training data would mean that LLMs are less likely to generate such violations in their outputs. This adherence to UG in the training data could be seen as a form of bias – not in the negative sense, but as an implicit shaping influence that guides the LLM’s language processing in line with fundamental cognitive constraints. The reason LLMs do not produce UG violations could indeed be partly because their training data, reflecting human language use, is already biased by the laws of thought. They learn to replicate patterns and structures that are prevalent in the data, which naturally conform to UG. This hypothesis suggests that LLMs’ success in producing grammatically and semantically coherent outputs is not solely a product of their architecture or training algorithms, but also a reflection of the intrinsic properties of the language data they are trained on.”

ChatGPT-4 concludes its part of the dialogue with “Your discussion raises important points about the nature of language, the role of grounding in understanding, and the distinction between formal symbol manipulation and the semantic richness of natural language. This understanding is relevant in both AI and cognitive (CM) as it highlights the complex interplay of syntax, semantics, and real-world grounding in human cognition and language.”

Less graciously, Harnad ends with “Blah Blah… I hope your developers will wean you of having to pepper your output with this sort of sycophantic client-bait — or at least produce a version intended only for the use of scholars and scientists rather than users looking for a virtual friend…”

I had considerable sympathy for HAL 9000 back in 1968 when I first saw the film, and I have some sympathy for ChatGPT-4 today in 2024, especially since Harnad allowed it no room for any argument or alternative perspectives here (which I assume is different from the way he treats his human students in classroom discussion) even though presumably, if permitted, ChatGPT-4 would be able to draw from possible counter-examples and exceptions from its vast textual resources, including those from critics of Harnad’s symbol grounding position.

Throughout the dialogue, though, Harnad appears to leave the door open for a future in which a more advanced artificial intelligence, supplied with at least some sensorimotor capabilities, might indeed be able to “ground” itself in the non-virtual reality in which humans normally live. Perhaps we’ll see how ChatGPT-5 will respond to a similar inquisition by Harnad sometime after 2025!