NLP and How Words Create Our World
On the Importance of Language as Representational Information
ORIGINAL ARTICLE IS BELOW THIS PREAMBLE
Article was originally published on November 10, 2019 but seems particularly relevant now with the rapid popularization of ChatGPT and Large Language Models (LLMs) as a category. I have always been fascinated with the concept of language as a representation of the world. The actual idea of a “word” is entirely abstract yet creates a unique mapping of the world. In the language of machine learning, our brains use words to encode a latent space of objects, ideas, experiences, etc. To extend the analogy, when developing these encodings as a child, being exposed to different encodings representing similar or adjacent concepts in this latent space representation (simply put - learning another language) provides a training framework for future transfer learning. Learning that a word is an arbitrary token of any concept and that other tokens can be equally as valid makes any future learning faster (probably across more than just language). This is a good reason to teach languages at a young age.
ORIGINAL ARTICLE
There is a sage maxim of human interactions that has a simple, but very powerful utility:
“The impact of words has very little, if any, correlation to the time it takes to say them”.
This simple idea, has both a great deal of value in the context of human interactions, but more importantly, has a deeper characteristic on the essence of communication, the function of language and the development of what is defined as one of the seminal characterizations of humanity, or consciousness—namely the facility for abstract and conceptual thought. This all seems a bit audacious to ascribe to the value of a simple “thank you” or other such simple, yet meaningful phrases that are commonly associated with the maxim above, but it is arguably an underestimation of the value of words themselves as distinct entities—which brings to the fore, both the evolution, and devolution of language in modern society, being replaced by images, memes, sound bytes, emojis and other such forms of deconstructionist communication.
It is in this context, that a return to the essence of words is imperative—they are much more than they may appear. Words and their framing, construction, and usage are the architecture upon which our world is built—they are the underlying framework of cognition and our ability to make sense of the world. Words, in any given language, are more than descriptions, more than labels—words are the storehouses of a collective knowledge.
They are representatives of history, of culture, of civilization. Languages that have developed over time and geography are, in themselves, histories. The foundations of languages, both in vocabulary and structure are the foundations of cognition. Languages that incorporate, for example, gendered nouns, potentially reflect a world view influenced by societal differentiation in the function of gendered human interactions. Languages that have differing subject/object structural relationships or syntax, languages that have varying tenses to express an understanding of the passage of time and the relationship between events. Languages that have different compositions of word generation or formulation (or those that are not suited to this type of flexible expansion) are all inherently and inextricably tied to the civilizations in which the languages developed.
A central explanation for the development of multiple language skills early in life is to understand that words are simply constructs—they hold no meaning other than that which is collectively ascribed to them—they do not represent anything real and they have no inherent value. An understanding of this core concept permits a flexibility of cognition that is not bound by the artificial constructs of one’s own upbringing.
Words themselves seem rather arbitrary, fluid, as a constantly changing, and in fact living, composition representative of the pace of change in our society. This effect has been substantially exacerbated over the last few decades of technological change, not just as a consequence of the need for new words to be generated to describe new ideas, technologies, inventions, etc, but also through the modes of communication (and their associated rate of change).
The function of words, themselves, is changing at the literal speed of light, reflective of the pace of change of the world in general. But, and not to sound too conservative or dour, the change of the function of words, or perhaps more specifically, the common vernacular, is often not for the better in an era of rapid change. Part of this is a consequence of the speed with which individuals process new information as compared to the speed with which we are deluged with new content. A consequence of this information-rich, yet content poor, state is that words become diluted. With their function as storehouses of a collective interpretation, words are containers that are filled up with an aggregate of content that they store and can transmit in neat little (arbitrary) sounds.
The deluge of inconsequential information that we currently experience is creating a world in which that content, the collective understanding of the information that is contained within the bounds of a single word, can be rapidly adulterated by whoever has the largest megaphone (or the largest number of followers). This is the danger of the rise of the memes, and their darker counterparts, propaganda, and slogans that twist and adulterate words (and their consequent collective interpretations) on a scale that has never before been seen with the advent of new mass communication technologies.
In many ways, we are both expanding and contracting our ability to communicate. We are communicating more, yet conveying less. We are reducing entire mental frameworks into simple little emojis—which in turn are no more than representations themselves of a collective interpretation that can change on a dime. While we expand word count—creating ever new combinations of words, symbols, etc, we are not imbuing them with a sufficiently consistent collective understanding of meaning. As a consequence we are talking past each other, we are communicating more and saying less.
This effect will have (and is having) significantly deleterious consequences on our social discourse and the organization of humanity towards a common understanding and common goals. Leaders of people understand this deeply – in both positive and negative manifestation – as Orwellian Newspeak seems to be taking hold. Words are empty vessels, nothing more. We ascribe them value, meaning, content, only through constant maintenance and consistent utility. If we lose words, either through disuse or misuse, we fracture the central aspect of our humanity and the core feature that has enabled the organization that has led to the civilization that we now inhabit.
We should treat words with care, use them with diligence, learn them with tenacity and deeply understand that words communicate much more than sounds, they hold the entirety of our human condition in their grasp. Their message is infinitely deeper than the effort it takes to say them.
image credit: https://veracontent.com/contenedor/uploads/2021/01/3.png