How stable are different words?

In my research I do a lot of work with so called Swadesh lists – lists of the same word translated into hundreds of languages. You can read more about them here.

Most people use the Dyen et al 1992 [1949] stability lists, but many also quote the work of S A Starostin. Calude & Pagel (2011:1102), for instance, say that “Starostin’s list is a subjective rank-ordering based on his work with [14 language families] from the most stable (rank = 1) or slowly evolving to less stable (rank = 110).” Unfortunately, unlike the Dyen et al material, the Starostin list is only available on print, in Russian. With the help of my Russian-speaking friend Tanja I’ve translated the Starostin list into its Swadesh list (English) equivalences.


Calude, Andreea S., och Mark Pagel. 2011. ”How Do We Use Language? Shared Patterns in the Frequency of Word Use across 17 World Languages”. Philosophical Transactions of the Royal Society B: Biological Sciences 366 (1567): 1101–7. doi:10.1098/rstb.2010.0315.
Dyen, Isidore, Joseph B. Kruskal, och Paul Black. 1992. An Indoeuropean classification : a lexicostatistical experiment. Philadelphia: American Philosophical Society.
Starostin, S. A. 2007. Languages of the Slavic culture. In Works on linguistics (ed. S. A. Starostin), pp. 827–839. Moscow: Nauka. [Note: this work is in Russian]


Rank Word
1 we
2 two
3 I
4 eye
5 you (thou)
6 who
7 fire
8 tongue
9 stone
10 name
11 hand
12 what
13 die
14 heart
15 to drink
16 dog
17 louse
18 moon
19 nail
20 blood
21 one
22 tooth
23 new
24 dry
25 liver
26 eat
27 tail
28 this
29 hair (strand)
30 water
31 nose
32 not (negation)
33 mouth
34 full
35 ear
36 that
37 bird
38 bone (skeleton)
39 sun
40 smoke
41 to stand
42 tree
43 ashes
44 to give
45 rain
46 star
47 fish
48 neck
49 breast/chest
50 leaf
51 to come
52 to kill
53 foot/leg
54 to sit
55 root
56 thin
57 horn
58 to fly
59 to hear
60 skin
61 long
62 worm 
63 meat
64 road
65 to know
66 salt
67 to say
68 egg
69 seed
70 knee
71 black
72 head
73 to sleep
74 to burn (something)
75 earth/soil
76 year
77 feather
78 to swim
79 white
80 to bite
81 fat (noun)
82 man (male)
83 human
84 whole
85 snake
86 night
87  to see
88 heavy
89 to go/walk
90 warm
91 red
92 cold
93 woman
94 round
95 close
96 yellow
97 to lie (be in a lying down position))
98 green
99 cloud
100 far (adjective)
101 big
102 bark (of tree)
103 sand
104 short
105 good
106 many
107 mountain
108 wind
109 stomach
110 little

PC rage!

“PC-driven language is politically motivated, and so attracts more attention and resistance than most euphemism. PC euphemism is perceived to arise directly out of linguistic intervention. Except for slang (or at least their own slang), people generally dislike linguistic change, especially change that smacks of deliberate manipulation.

Peggy Noonan, a journalist and former speech writer for Ronald Reagan, spent much of a piece on political correctness lamenting the loss of words she claims to have been hijacked and reinterpreted: ‘I wish we could rescue them’, she writes, ‘and return them to their true meanings.’ She describes, for instance, gay as once being ‘a good word because it sounded like what it meant – “merry and bright”’. As an aside, the etymology of gay can be traced back to one of two Old High German words – one with the meaning ‘good, beautiful’, the other ‘impetuous, swift’. Presumably, neither of these is the ‘true meaning’ Peggy Noonan had in mind: ‘merry, bright’ is what she grew up with and feels comfortable with, and it is this meaning she believes is the true meaning that she wants to revive.

Like Noonan, many people complain about losing what they see as their freedom to call things by their ‘right names’ – as if there were something natural and correct about their own linguistic preferences; but perhaps this is what most of us feel. Linguistic changes, whether or not PC-motivated, are seen by many as the thin end of a wedge that will fragment society into factional interest groups. Such hostility is fuelled by media hyperbole and misrepresentation.”

(Allan, Keith, and Kate Burridge. 2006. Forbidden words: taboo and the censoring of language. Cambridge, UK: Cambridge University Press.)


Sign Language Politeness

Swedish Sign Lang Alphabet

Ran across this quote about polite behavior when passing to signers. I’ve been wondering about what the appropriate behavior is in Sweden, here’s how one signer sees it for ASL (American Sign Language):

“Another norm governs what is appropriate behavior if you have to walk between two people who are signing to each other. In spoken language conversations, i is polite to say ‘excuse me’ as you pass. That is, it is appropriate to use language to recognize the fact that you are temporarily in the way. However, in the deaf community, it is perfectly acceptable and polite to walk between two people having an ASL conversation without signing ‘EXCUSE-ME’. Not only is it polite, but to stop and sign ‘EXCUSE-ME’ or to duck one’s head or bend over as one walks by may even be unacceptable because it will almost always bring conversation to a halt and cause an interruption. This is a norm that differs from the norms for spoken languages conversations.”

p176, Valli, Clayton, and Ceil Lucas. 2001. Linguistics of American Sign Language: an introduction. Washington, D.C.: Gallaudet University Press.


How many words are there in a language?

So Many Words
…This is always a vexing question to get asked at parties. It’s like the question “how many languages are there”. There’s no short satisfactory answer, and the truest one remains “honestly, we have no idea”, which is, of course, disappointing.

Now, here are some interesting statistics about how many different words an average speaker of English uses. Turns out that with about 2000 different words, you cover something close to 95% (and maybe more) of the spoken vocabulary. All the rest are just trimmings.

From Thornbury, Scott, and Diana Slade. 2006. Conversation: from description to pedagogy. Cambridge: Cambridge University Press, p. 42-43

“2.1 Lexical size
In terms of the number of words they need to control, the demands placed on speakers and listeners are considerably fewer than they are for writers and readers: ‘From the small amount of evidence available, itseems that about half the words needed to understand written Englishare needed to understand spoken English’, notes Nation (1990: 85).Schmitt cites an analysis of a corpus of Australian English (Schonell etal ., 1956) which suggests that ‘a person can largely function in every-day conversation with a vocabulary of 2000 words’ (Schmitt, 2000: 74).On the basis of computer counts of word frequency, McCarthy notesthat ‘there is usually a point where frequency drops off rather sharply,from hard-working words which are of extremely high frequency towords that occur relatively infrequently’ (1999: 7). The drop-off point,according to McCarthy, is situated around about 2000 words down inthe frequency ratings, leading him to conclude ‘that a round-figure ped-agogical target of the first 2000 words in order of frequency will safelycover the everyday core’ (ibid .). However, Adolphs and Schmitt (2003),in comparing the Schonell et al. word list with one derived from theCANCODE corpus, found that the top 2000 word families in factprovide only around 95 per cent coverage, rather than the near 100 percent claimed by Schonell et al . (Note that the researchers refer to word families, not individual words. A word family is a base word plus itsinflexions and its most common derivations, a concept that correlatesmore or less with the headwords of a typical dictionary.) Adolphs and Schmitt admit, though, that ‘there is almost no research which exploresthe percentage of words which need to be known in order to operate suc-cessfully in a spoken environment’ (2003: 432). They therefore tenta-tively suggest that 2000 word families may be a useful starting point butthat ‘3000 word families (providing coverage of nearly 96 per cent) is abetter goal if learners wish to minimize their lexical gaps’ (2003: 433).
This figure, however, is based on a broad cross section of native speakercontexts, including professional and academic registers, in the UK andIreland, and does not necessarily represent the lexical needs of most learn-ers, especially those learning English as an International Language(McKay, 2002). For such learners, the 95 per cent coverage of a nativespeaker’s lexical coverage represented by 2000 words would seem to bemore than sufficient, and the effort involved in learning a further 1000words in order to gain one percentage point extra coverage seems out of all proportion to the gains in communicative efficiency that might bemade. In fact, the target need not even be as high as 2000 – West (1960)developed a minimum adequate speech vocabulary for learners of Englishof just 1200 words (compared to the minimum 2000 words for dealingwith written language). This, he argued, would be sufficient for learnersto say most of the things they would need to say. Moreover, even fewerwords would still give the learner (theoretically at least) an advantage,since, in spoken language, a little goes a long way: McCarthy and Carter(1997) point out that whereas the 50 most frequent words in written textcover 38.8 per cent of all written text, the top 50 spoken words cover 48.3per cent – that is to say almost half – of spoken text.”


The etymology of Swedish tallrik ‘plate’ and Norwegian/Danish tallerken ‘plate’


Latin taliare ‘cut’
   -> Old French tailleor ‘cutting board, plate’
    -> Mid High German tallor ‘plate’
      -> MHG tallor-ken ‘plate-DIMINUTIVE’
        -> East Scandinavian tallorken ‘plate’  (16th century)
           -> Danish tallerken ‘plate
           -> Norwegian Bokmål tallerken ‘plate’
             -> Norwegian Nynorsk tallerken ‘plate’ -> tallerk-en ‘plate-DETERMINATE’ -> tallerk ‘plate’
           -> Swedish tallriken ‘plate’ -> tallrik-en ‘plate-DETERMINATE’ -> tallrik ‘plate’


(this is a good example of “folk etymology” and “back formation”)


Fårö = Sheep Island?

Warning for Sheep

Linguists are wary of false friends – false cognates – which is when two words in different languages seem very similar, but have no common history. One example is Arabic “sharif” which is a tribal title for someone who protects their tribe, and English “sheriff” a law man who protects a district. Related? Nope, just a coincidence.

In historical linguistics there is a problem when two words from different times in the history of a languages seem to be the same, but the latter really didn’t evolve from the first one.

My friend Andreas told me about this cool “false friend” in Swedish. We have this island off of Gotland called “Fårö”. Får = sheep, ö = island. Så sheep island, right? Noo, Får was originally far – as in farväg “road for travel” or farvatten “water for travel”. So “travel island”.

A clue in this is that, unlike on the mainland, the Gotlanders say “lamm” for sheep, and not “får…



Celebrate language change!

Language change is awesome.  Got this quote…

“The language most likely to continue long without alteration, would be that of a nation raised a little, and but a little, above barbarity, secluded from strangers, and totally employed in procuring the conveniencies of life; wither without books, or, like some of the Mahometan countries, with very few: men thus busied and unlearned, having only such words as common use requires, would perhaps long continue to express the same notions by the same signs, But no such constancy can be expected in a people polished by arts, and classed by subordination, where one part of the community is sustained and accommodated by the labour of the other. Those who have much leisure to think, will always be enlarging the stock of ideas, and every increase of knowledge, whether real or fancied, will produce new words, or combinations of words. When the mind is unchained from necessity, it will range after convenience; when it is left at large in the fields of speculation, it will shift opinions; as any custom is disused, the words that expressed it must perish with it; as any opinion grows popular, it will innovate speech in the same proportion as it alters practice.”

… from this article: http://www.economist.com/blogs/johnson/2012/07/plurals-0


Guv’na! The lord’s prayer in Cockney

Me ol’ china plate [=friend] Sigi, showed me”The Bible in Cockney” a few days ago – Cockney is a dialact from East End in London, and Cockney Rhyming slang is a fascinating local dialect and language game, where some words are exchanged for others that rhyme. Read more about it here: http://www.cockneyrhymingslang.co.uk/


Here’s the lord’s prayer (the main Christian prayer) in Cockney:


‘ello, Dad, up there in good old ‘eaven,
Your name is, well, great and ‘oly,
and we respect you, Guv.
We ‘ope we can all ‘ave a butcher’s at ‘eaven
and be there as soon as possible.
And we want to make you ‘appy, Guv,
and do what you want ‘ere on earth,
just like what you do in ‘eaven.
Guv, please give us some Uncle Fred,
and enough grub and stuff to keep us going today.
And we ‘ope you’ll forgive us when we cock things up,
just like we’re supposed to forgive them who annoy us
and do dodgy stuff to us.
There’s a lot of dodgy people around, Guv;
please don’t let us get tempted to do bad things.
‘elp keep us away from all the nasty, evil stuff
and keep that dodgy Satan away from us,
‘cos you’re much stronger than ‘im.
You’re the Boss, God,
and will be forever, innit?

And for those of us who’ve forgotten most of what we learnt in high school’s Comparative Religion Study classes, here’s an English version for comparison:

Our Father in heaven,

hallowed be your name,
your kingdom come,
your will be done,

on earth as in heaven.
Give us today our daily bread.
Forgive us our sins

as we forgive those who sin against us.
Save us from the time of trial

and deliver us from evil.



The awesomeness that is a continually shifting and evolving sociophysical matrix

Just ran across a great quote about meaning that I wante to share. Might be a bit dense for non-linguists, but, trust me, it’s beautiful.

“(…) the range of symbolic units available to the language user massively underdetermine the range of situations, events, states, relationships, and other interpersonal functions that the language user may potentially seek to use language to express and fullfil. One reason for this is that language users live in a sociophysical matrix that is continually shifting and evolving. No two situations, feelings or relationships, at any given point in time, are exactly alike.” (Evans 2009, 71)



Evans, Vyvyan. 2009. How words mean lexical concepts, cognitive models, and meaning construction. Oxford: Oxford University Press. http://site.ebrary.com/id/10409021.


The history of COWABUNGA!

If you image google COWABUNGA the images you get things like this:

Cowabunga Bart Simpson  Cowabunga turtles

So, surfers and turtles! Explosive, exciting things happening.

I’m not a native speaker of English, but learnt the language mainly through watching tv* – personally I believe much of my vocabulary was formed under the influence of Little House on the Prairie and Star Trek… (-:

I did watch Teenage Mutant Ninja Turtles, as well, and learnt the word COWABUNGA! A word of action, when a turtle would do something (typically a really cool drop kick).

But where does the word come from? John Algeo is an avid word fan and linguist, the author of a lot of books on neologisms (=new words). He’s tracked the word down. Turns out it has nothing to do with surfers or turtle action heroes, but was an expression created for Chief Thunderthud, a Native American character on an American children’s show from the mid 20th century (Algeo 1980). He apparantly used the word to express disappointment. When he was happy, he said “Kawagoopa” instead.

So it’s an imagined Native American greeting! Stereotypical and a bit racit? Sure. Well, at least they knew Native Americans had their own languages, I guess that’s something. Pre-columbus there were around 300 languages in North America. Now there are only about 25 left that aren’t moribound.

*About 1/3 of the programs on the Swedish state tv-channels are English nowadays, and about 4/5 of the programs on smaller channels are in English.