Introduction to The Sapir-Whorf Hypothesis

A brief introduction to the Sapir-Whorf hypothesis, linguistic relativism, and polysemy

Table of Contents

Click Here for the Related IntroNotes Article: Toki Pona: The lili-est Language You Haven't Heard of Yet

Today let’s pitch the Linguistic Relativists against the Linguistic Universalists via Toki Pona’s use of colour.

The Sapir-Whorf Hypothesis

“one's language determines one's conception of the world”

This linguistic hypothesis claims that not only do we externally vocalise our perceptions and thoughts in a manner dictated by the characteristics and breadth of our language, but that we also internally frame our thoughts in a manner dictated by our language.

The Sapir-Whorf Hypothesis (SWH), the implications of which are often referred to as Linguistic Relativity, arose from the 19th and 20th century writings of Edward Sapir and Benjamin Lee Whorf (amongst others). The history of this school of thought is, as one can expect from the anthropology of the era, dripping in racism and notions of European political and intellectual supremacy. Nevertheless, the SWH discussed in isolation does lend itself to interesting conversations. As such I’m not going to talk much about the analysis of linguistic relativity with regards to the Inuit or Hopi peoples, because I feel that I lack the sufficient skills to extract the useful knowledge from what is otherwise a dumpster fire of flaming racist garbage.

So I shall return to the SWH. For this, I will choose to use the terminology I am more familiar with, that of neurology and psych:

  • Perception = The process of observing/being aware of an external world object stimulus and internalising it through the senses and pathways of thought to conscious and unconscious mind. “see an apple, think an apple”
  • External Object = The external ‘thing’, the noumenon, the “apple”
  • Stimulus = The external physical characteristic that can be sensed: e.g. the red light reflecting off the apple, the smell of the apple, the sound wave of a voice.
  • Percept = The internal phenomenon, constructed in the mind. The internal image. The idea of that specific apple in front of you.
  • Cognition = The mental process of learning and understanding, as discussed in thought, memory, perception etc.
  • Thought = the flow of ideasassociations, and internal objects leading to a conclusion about reality.

When discussing the SWH we might discuss two extents of its application, the strong and weak interpretations.

Linguistic Determinism vs Influence

The strong version of the Sapir-Whorf Hypothesis, deemed Linguistic Determinism, holds that language determines the structure and limits of thought and cognition. A rather simplified example is that our percept of an ‘apple’ is different from our percept of a ‘pear’ because we have different words for them. When see an apple or pear the higher order, more abstract, processes of perception encode the stimulus to a category which it already has available. Where there is both a word for ‘apple’ and for ‘pear’ the mind has an associated category for ‘apple’ and  for ‘pear’. As such the metaphorical admin workers can file the encoded stimulus of the red light under the ‘apple’ category. Thus the external object and it’s stimulus can be represented internally via the percept of an ‘apple’. Boom! That thing is an apple!

Without this linguistic category, or even a way to subdivide larger categories (e.g. sweet, red, fruit), we would not have the cognitive categories to interpret these different kinds of fruit as being different. We would lose the nuance of interpretation of different fruits, and thus have a word and category that referred to all fruit. (Such as the Toki Pona kili = ‘fruit’, ‘vegetable’, ‘mushroom’ ≈ ‘edible plant’)

The weak version of the Sapir-Whorf Hypothesis, deemed Linguistic Influence, simply holds that these linguistic categories influence but do not fully define how we think. This is admittedly easier to test experimentally in a lab with test subjects of different linguistic backgrounds. For example the standard anglophone in both reading and spatial metaphor associates the flow from left to right as analogous with past to future. When asked what word comes ‘before’ another in a sentence, anglophones look to the left. Thus when primed with horizontal cues the anglophone will more rapidly process notions of time and order. Whereas the standard sinophone associates past with up and the future with down. For the sinophone, the associations of 上 (shàng), conceptually linked to being spatially up and temporally previous (and converse 下, xià), results in faster processing in temporal questions when primed with vertical cues.  An interesting side thought about this is, do perhaps these cognitive benefits from correct priming instead result from the associations of spatial and temporal metaphor, or because of the innate process of ingrained reading direction and the processes associated with this movement. There is consensus agreement that in the examples of English and Mandarin, the two mechanisms are not unrelated. The theory holds that the direction of writing arose from the pre-existing linguistic metaphor association, rather than the other way around.

A similar discussion, highlighting the division between relativist and universalist thinking, is experiments about colour perception.

Colour Categories

The universalists Berlin and Kay, examined 20 languages and the way they divided colour categories. They defined a colour category as being:

  1. Monolexemic (e.g. ‘orange’ as opposed to ‘yellow-red’)
  2. Monomorphemic (e.g. ‘blue’ as opposed to ‘blue-ish’
  3. Not a subset of greater category (e.g. ‘green not ‘emerald’ which is a subset of green)
  4. Isn’t only applicable to certain objects (e.g. ‘yellow’ as opposed to ‘blonde’)
  5. Is salient for all informants (e.g. “blue” as opposed to “tiffany™-colour”)

Interesting they found that for languages with fewer than 11 categories, the categories always appeared in a specific pattern as the number of total categories increased. For languages with N categories, if:

  • N = 2, ‘black’ AND ‘white’
  • N = 3, add ‘red’
  • N = 4, add ‘green’ OR ‘yellow’ but not both
  • N = 5, add ‘green’ AND ‘ yellow’
  • N = 6, add ‘blue’
  • N = 7, add ‘brown’
  • N ≥ 8, add ‘grey’ OR ‘orange’ OR ‘pink’ OR ‘purple’

Interestingly colours were found to roughly refer to the same shades across languages, as noted in the Munsell Colour System (hue, value, chroma aka colour, brightness, saturation). This seems to suggest an evolutionary agreement on which colours are important to humans, or perhaps what colour stimuli we can perceive (recalling retinal and sensory neurology).

We find that Toki Pona has been constructed as a N = 5, language and thus has the words: (pimeja = black) (walo = white) (loje = red) (laso = green) (jelo = yellow). As a language intentionally constructed by a linguist, it’s to be expected that this selection aligns with Berlin and Kay’s model for colour evolution.

But the question then arises, how do we refer to other colours in N < 6 languages like Toki Pona? Two handy tricks polysemy and use of polylexemes.

Polysemy is the coexistence of multiple meanings (sememes) for a word, word-group, or word-bit (morpheme). This can been seen in some words in modern natural languages that have since developed N > 11, but may still have polysemous words from earlier in its evolution. Some such examples are included below:


Blue/Green Polysemy

Blue Monosemy

Green Monosemy




绿 (lǜ) 


 الخضراء (al-khaḍrā')

 أزرق (azraq)

أخضر (akhḍar)









Toki Pona




A polylexeme is a unit made of two lexemes rammed together, as in the example “yellow-red” to mean “orange”. Vietnamese employs a similar in its Blue/Green distinctions by adding other kinds of descriptors to the root xanh:

  • xanh da trời = ‘blue’
  • xanh dương = ‘sky blue / azure’
  • xanh nước biển = ‘ocean blue’
  • xanh lá cây = ‘leaf green’

Toki Pona can employ polylemes in these two systems of mixing and describing systems:

  • loje laso (blue-ish red) or laso loje (red-ish blue) ≈ ‘purple’
  • jelo loje (red-ish yellow) ≈ ‘orange’

 OR perhaps

  • loje pi telo loje (red of ‘red water’) ≈ ‘blood-red’
  • laso kasi VS laso pi telo suli (blue/green of plant/leaf VS blue/green of ‘big water’) ≈ ‘leaf-green’ VS ‘ocean-blue’

So whilst the linguistic relativists might say that these speakers of N < 6 languages view colour through the lenses and combinations of the N categories of colour and thus categories of thought, the linguistic universalists would instead demand every culture who can gaze upon the blue ocean will have a way to conceptualise that colour distinct from the colour of a leaf, even if they have no monolexemic monomorpheme to describe it.

I can’t imagine there will ever be a consensus amongst the linguists, but as a someone who sits in the sidelines watching (sports metaphor—ha!), it’s fun to see what happens.

Banner image edited from image of sitelen pona from Toki Pona: The Language of Good

You may also like:

Introduction to Phytohormones

A brief introduction to phytohormones.



COPYRIGHT INFO : TEXT = CC0, INFOGRAPHICS = Creative Commons License