How stable are different words?

In my research I do a lot of work with so called Swadesh lists – lists of the same word translated into hundreds of languages. You can read more about them here.

Most people use the Dyen et al 1992 [1949] stability lists, but many also quote the work of S A Starostin. Calude & Pagel (2011:1102), for instance, say that “Starostin’s list is a subjective rank-ordering based on his work with [14 language families] from the most stable (rank = 1) or slowly evolving to less stable (rank = 110).” Unfortunately, unlike the Dyen et al material, the Starostin list is only available on print, in Russian. With the help of my Russian-speaking friend Tanja I’ve translated the Starostin list into its Swadesh list (English) equivalences.


Calude, Andreea S., och Mark Pagel. 2011. ”How Do We Use Language? Shared Patterns in the Frequency of Word Use across 17 World Languages”. Philosophical Transactions of the Royal Society B: Biological Sciences 366 (1567): 1101–7. doi:10.1098/rstb.2010.0315.
Dyen, Isidore, Joseph B. Kruskal, och Paul Black. 1992. An Indoeuropean classification : a lexicostatistical experiment. Philadelphia: American Philosophical Society.
Starostin, S. A. 2007. Languages of the Slavic culture. In Works on linguistics (ed. S. A. Starostin), pp. 827–839. Moscow: Nauka. [Note: this work is in Russian]


Rank Word
1 we
2 two
3 I
4 eye
5 you (thou)
6 who
7 fire
8 tongue
9 stone
10 name
11 hand
12 what
13 die
14 heart
15 to drink
16 dog
17 louse
18 moon
19 nail
20 blood
21 one
22 tooth
23 new
24 dry
25 liver
26 eat
27 tail
28 this
29 hair (strand)
30 water
31 nose
32 not (negation)
33 mouth
34 full
35 ear
36 that
37 bird
38 bone (skeleton)
39 sun
40 smoke
41 to stand
42 tree
43 ashes
44 to give
45 rain
46 star
47 fish
48 neck
49 breast/chest
50 leaf
51 to come
52 to kill
53 foot/leg
54 to sit
55 root
56 thin
57 horn
58 to fly
59 to hear
60 skin
61 long
62 worm 
63 meat
64 road
65 to know
66 salt
67 to say
68 egg
69 seed
70 knee
71 black
72 head
73 to sleep
74 to burn (something)
75 earth/soil
76 year
77 feather
78 to swim
79 white
80 to bite
81 fat (noun)
82 man (male)
83 human
84 whole
85 snake
86 night
87  to see
88 heavy
89 to go/walk
90 warm
91 red
92 cold
93 woman
94 round
95 close
96 yellow
97 to lie (be in a lying down position))
98 green
99 cloud
100 far (adjective)
101 big
102 bark (of tree)
103 sand
104 short
105 good
106 many
107 mountain
108 wind
109 stomach
110 little

PC rage!

“PC-driven language is politically motivated, and so attracts more attention and resistance than most euphemism. PC euphemism is perceived to arise directly out of linguistic intervention. Except for slang (or at least their own slang), people generally dislike linguistic change, especially change that smacks of deliberate manipulation.

Peggy Noonan, a journalist and former speech writer for Ronald Reagan, spent much of a piece on political correctness lamenting the loss of words she claims to have been hijacked and reinterpreted: ‘I wish we could rescue them’, she writes, ‘and return them to their true meanings.’ She describes, for instance, gay as once being ‘a good word because it sounded like what it meant – “merry and bright”’. As an aside, the etymology of gay can be traced back to one of two Old High German words – one with the meaning ‘good, beautiful’, the other ‘impetuous, swift’. Presumably, neither of these is the ‘true meaning’ Peggy Noonan had in mind: ‘merry, bright’ is what she grew up with and feels comfortable with, and it is this meaning she believes is the true meaning that she wants to revive.

Like Noonan, many people complain about losing what they see as their freedom to call things by their ‘right names’ – as if there were something natural and correct about their own linguistic preferences; but perhaps this is what most of us feel. Linguistic changes, whether or not PC-motivated, are seen by many as the thin end of a wedge that will fragment society into factional interest groups. Such hostility is fuelled by media hyperbole and misrepresentation.”

(Allan, Keith, and Kate Burridge. 2006. Forbidden words: taboo and the censoring of language. Cambridge, UK: Cambridge University Press.)

etymology, Semantics

Road and Routes from Latin to English


Did you know that the English term route and Swedish rutt ‘route’, as well as French route ‘road’ comes from the Latin term via rupta ‘broken road’. At first I thought about the slowly decaying Roman roads spread all over Europe, but in fact, broken here means ‘broken through nature/the underbrush’.

The English term road is not from the same stem at all! It comes from Middle English rode ‘act of riding, journey’, and ultimately from Old English ridan ‘to ride’.


Buck, Carl Darling. 1949. A dictionary of selected synonyms in the principal Indo-European languages; a contribution to the history of ideas. Chicago: University of Chicago Press.


och & luras

(Visste du att de olika &-tecknen alla är försök att skriva “et” (och på latin)?)

Idag stötte jag på två olika anslag som fick mig att stanna upp. Det händer rätt ofta med språk – jag blir fascinerad av något i språkbruket och missar det som faktiskt sägs eller står.

Först var det en bildekal: “Finns livet efter döden? Rör hojen och du vet”.

Jag hade nog sagt “Rör höjen så vet du” eller ännu hellre “Rör höjen så får du veta”. Men mtp att det råder platsbrist på dekaler så hade det nog blivit det första alternativet.
“Rör hojen och du vet” stör mig jättemycket. Jag har hört den här typen av meningsbyggnad förut och har inget problem att förstå. Men det känns fel. Anledningen lär nog vara att “och” används (vanligtvis) för att samordna två likvärdiga led. I både “Rör hojen och lyft upp den” samt “Rör hojen och bilen” så är det två huvudsatser respektive två huvudled som samordnas. “Så”, med flera andra ord, används för att samordna en huvudsats (“Rör bilen”) och en bisats (“Du vet”). Att använda “och” för att samordna en huvudsats och en bisats är ovanligt och, för mig, lite oväntat – därav den krypande känslan i skinnet… (Jag slår vad om att Svenska Akademiens Grammatik tar upp även bisats-huvudsatssamordningen som ett vanligt förekommande satsmönster, men är lite för efter-maten-trött för att kolla.)


Det andra jag stötte på var en tidningsrubrik på http://dn.se – “Lurades att de var barnbarn – straffas”. Det finns två läsningar av detta som står helt i motsats mot varandra. Den ena är att “någon lurade dem att att de var barnbarn – (men) nu straffas de”, den andra är “de lurade någon att de var barnbarn – nu straffas de”. Det är förstås den sista som är fallet, men det krävdes en del tankekraft (säkert två tre sekunders osäkerhet) innan jag slog fast det. Man kan ju tänka sig att någon lurade personerna att de var barnbarn till X, varpå de krävde arv efter X. Men nu har det blivit klart att de inte är barnbarn till X, och straffas därför av en domstol, trots att de blev lurade in i situaionen… Kanske… Det är hursomhelst kul med meningar som kan tolkas i diametralt motsatt riktning!


Co-hort or cohort?

 A cohortus in a hortus…

Just saw a CNN (entertainment) journalist write “co-hort” for “cohort”. A misunderstanding, a reanalysis following co-op or co-ed? Or just historical play on form, since cohort comes from com-hortus, where “hortus” means garden and “com” means with. The people at the garden/plot of land -> guards -> army division in the roman army -> modern day group of people who work together. (Also the origin for “court”.)

Grammar, Semantics

Word of the day: apokoinou

Awesome new linguistic term: apokoinou (from Greek: “in common”). It refers to the situation where a speaker changes her mind mid-sentence, and makes the last part of a sentence the first part of another. Like this:

“there were three crows sat on a tree”

“I can’t find my wallet is here”

Sounds weird? We do it all the time. Try saying the sentences above with some pauses in between segments, like:

“I can’t… find my… wallet is here!!



Historical Linguistics, Printing

What were the first printed books in the Nordic Countries?

I ran across som cool stats on printing in Sweden and Denmark (at this time, Norwegians wrote in Danish). I’m especially charmed by the spelling of ‘devil’ in Middle Swedish: dyäfwlsen…

First printed books in Denmark (1482):
Breviarium Ottoniense (Odense Breviary) and Guillaume Caoursin’s De obsidione et bello Rhodiano (‘On the siege and war of Rhodes’), both printed by Johann Snell in Odense in 1482
brevarius ottoniense

First printed book in Danish (1495):
Den danske rimkrönike

First book printed in Finland (1488):
I can find no data on this…

First printed book in Finnish (1638):
A bible translation: Biblia, Se on: coci Pyhä Ramattu suomexi. An edition of 1,200 copies. it was printed in Stockholm at the press of Heinrich Keyser, since there was no printing press in Finland at that point.

First printed book in Sweden (1483):
Dialogus creaturarum moralizatus (an allegorical religious tract in Latin).
Dialogus creaturarum

First printed book in Swedish (1495):
Aff dyäfwlsens frästilse
aff dyäfwlsens frästilse

etymology, Historical Linguistics

How German is Scandinavian?

German Flag

Between one third and half of the everyday vocabulary of Scandinavian languages is borrowed from Low German. The borrowing mainly took place during the 13th to 16th century, when the Hanseatic Trade League’s influence on northern Europe was largest. Not only words were borrowed, but many frequent derivational affixes, such as be- (be-rika, be-ivra, be-tvinga in Swedish) and -het (svensk-het, tursam-het), were borrowed. We don’t know to which extent the Low German influence also contributed to the massive simplification in the morphology of especially nouns – so that the only thing that remains of e.g. the dative form are fossilized expressions like gå man ur huse where huse is inflected in its old dative form.

Here’s a Norwegian example (from Torp 2002) of a sentence where all content words are borrowings from Low German:

Skredderen tenkte at trøya passet fortreffelig, men kunden klaget og mente at plagget var kort och tøyet simpelt tog grovt.

[The tailor thought that the jacket fittet perfectly, but the customer complained and means that the garment was short and the material was unsophisticated and coarse.]



Arne Torp. 2002. Chaper 2: The Nordic languages in a Germanic perspective. In Bandle, Oskar, Lennart Elmevik, and Gun Widmark (Eds) The Nordic languages an international handbook of the history of the North Germanic languages. Volume I. Berlin: W. de Gruyter.

etymology, Semantics

The etymology of the word “bad”


I’ve just discovered the etymology for English bad ‘not good’ in Darling Buck 1949, and was going to write a postabout it – but the writers over at Online Etymology Dictionary have read the same chapter as I, and sum it up nicely:

bad (adj.) c.1200, “inferior in quality;” early 13c., “wicked, evil, vicious,” a mystery word with no apparent relatives in other languages.* Possibly from Old English derogatory term bæddel and its dim. bædling “effeminate man, hermaphrodite, pederast,” probably related to bædan “to defile.” A rare word before 1400, and evil was more common in this sense until c.1700. Meaning “uncomfortable, sorry” is 1839, American English colloquial.

Comparable words in the other Indo-European languages tend to have grown from descriptions of specific qualities, such as “ugly,” “defective,” “weak,” “faithless,” “impudent,” “crooked,” “filthy” (e.g. Gk. kakos, probably from the word for “excrement;” Rus. plochoj, related to O.C.S. plachu “wavering, timid;” Pers. gast, O.Pers. gasta-, related to gand “stench;” Ger. schlecht, originally “level, straight, smooth,” whence “simple, ordinary,” then “bad”). ”


When did hermaphrodite become something which such a negative sentiment attached to it in the west? Clearly post-Roman empire, since they have hemaphrodite gods…


Loops and self-reference in dictionaries

One of the things I wanted to use this blog for was to publish (and get incentive to keep writing) the summaries that I really should write as soon as I read an article.

Today I read this article:

Levary, David, Jean-Pierre Eckmann, Elisha Moses, and Tsvi Tlusty. 2012. Loops and Self-Reference in the Construction of Dictionaries. Physical Review X, 2, 031018.

It reminded me of my Magister thesis, where I (among other things) looked at the dictionary network around the two synonyms skarp and vass in Swedish. The result was not a surprise – we know that dictionary entries refer to each other in loops, making it very difficult to understand a new concept, without understanding related concepts.

The Levary et al (2012) article did some cool statistics on these self-refential loops in the dictionaries. One such loop is in figure 5 below


Figure 5 has examples of strongly connected components (SCCs)– where all nodes (words) point to all other words – such as precipitation, sleet and snow.

Previously, other researchers had found on that in a typical dictionary:
“It was found that dictionaries consist of a set of words, roughly 10% the size of the original dictionary, from which all other words can be defined. This subgraph was observed to be highly interconnected, with a central, strongly connected component, dubbed the core.”

This should mean that if you know this core, you can use it to define all other words (now, your sentences would be awkward and long, with red or green fruit on bough of tree that you can eat and that you can make sauce and pie out of, instead of apple – and, of course, the definition you might end up with from the dictionary might not uniquely identify an APPLE at all – in this case, maybe grape would also be fitting). It would also, on the surface, mean that to understand the core, you have to understand basically all of it. Hmm…

Levary et al (2012) discovered that if you concentrate only on SCCs with loops less than or equal to 5 words), this core decomposes into several hundred SCCs, that are independent of each other. Here are some examples of SCCs:

{emotion, spirit, dejection, melancholy, feeling}
{height, end, dimension, length}
{bark, trunk, tree, lumber}

And then the authors looked at _when_ words had first been recorded. And they found that the words in SCCs (such as the ones above) tended to appear closer together in time than average words do. So once you start talking about a semantic field (maybe you move from a desert to a place with trees) lots of interdependent words appear in a relatively (150 years or so) short time span. The words co-eolve. This is cool. You don’t need to use trunk to define what a tree word mean, but conceptually bark and tree seem to be close.