How many words are there in a language?

So Many Words
…This is always a vexing question to get asked at parties. It’s like the question “how many languages are there”. There’s no short satisfactory answer, and the truest one remains “honestly, we have no idea”, which is, of course, disappointing.

Now, here are some interesting statistics about how many different words an average speaker of English uses. Turns out that with about 2000 different words, you cover something close to 95% (and maybe more) of the spoken vocabulary. All the rest are just trimmings.

From Thornbury, Scott, and Diana Slade. 2006. Conversation: from description to pedagogy. Cambridge: Cambridge University Press, p. 42-43

“2.1 Lexical size
In terms of the number of words they need to control, the demands placed on speakers and listeners are considerably fewer than they are for writers and readers: ‘From the small amount of evidence available, itseems that about half the words needed to understand written Englishare needed to understand spoken English’, notes Nation (1990: 85).Schmitt cites an analysis of a corpus of Australian English (Schonell etal ., 1956) which suggests that ‘a person can largely function in every-day conversation with a vocabulary of 2000 words’ (Schmitt, 2000: 74).On the basis of computer counts of word frequency, McCarthy notesthat ‘there is usually a point where frequency drops off rather sharply,from hard-working words which are of extremely high frequency towords that occur relatively infrequently’ (1999: 7). The drop-off point,according to McCarthy, is situated around about 2000 words down inthe frequency ratings, leading him to conclude ‘that a round-figure ped-agogical target of the first 2000 words in order of frequency will safelycover the everyday core’ (ibid .). However, Adolphs and Schmitt (2003),in comparing the Schonell et al. word list with one derived from theCANCODE corpus, found that the top 2000 word families in factprovide only around 95 per cent coverage, rather than the near 100 percent claimed by Schonell et al . (Note that the researchers refer to word families, not individual words. A word family is a base word plus itsinflexions and its most common derivations, a concept that correlatesmore or less with the headwords of a typical dictionary.) Adolphs and Schmitt admit, though, that ‘there is almost no research which exploresthe percentage of words which need to be known in order to operate suc-cessfully in a spoken environment’ (2003: 432). They therefore tenta-tively suggest that 2000 word families may be a useful starting point butthat ‘3000 word families (providing coverage of nearly 96 per cent) is abetter goal if learners wish to minimize their lexical gaps’ (2003: 433).
This figure, however, is based on a broad cross section of native speakercontexts, including professional and academic registers, in the UK andIreland, and does not necessarily represent the lexical needs of most learn-ers, especially those learning English as an International Language(McKay, 2002). For such learners, the 95 per cent coverage of a nativespeaker’s lexical coverage represented by 2000 words would seem to bemore than sufficient, and the effort involved in learning a further 1000words in order to gain one percentage point extra coverage seems out of all proportion to the gains in communicative efficiency that might bemade. In fact, the target need not even be as high as 2000 – West (1960)developed a minimum adequate speech vocabulary for learners of Englishof just 1200 words (compared to the minimum 2000 words for dealingwith written language). This, he argued, would be sufficient for learnersto say most of the things they would need to say. Moreover, even fewerwords would still give the learner (theoretically at least) an advantage,since, in spoken language, a little goes a long way: McCarthy and Carter(1997) point out that whereas the 50 most frequent words in written textcover 38.8 per cent of all written text, the top 50 spoken words cover 48.3per cent – that is to say almost half – of spoken text.”


The etymology of Swedish tallrik ‘plate’ and Norwegian/Danish tallerken ‘plate’


Latin taliare ‘cut’
   -> Old French tailleor ‘cutting board, plate’
    -> Mid High German tallor ‘plate’
      -> MHG tallor-ken ‘plate-DIMINUTIVE’
        -> East Scandinavian tallorken ‘plate’  (16th century)
           -> Danish tallerken ‘plate
           -> Norwegian Bokmål tallerken ‘plate’
             -> Norwegian Nynorsk tallerken ‘plate’ -> tallerk-en ‘plate-DETERMINATE’ -> tallerk ‘plate’
           -> Swedish tallriken ‘plate’ -> tallrik-en ‘plate-DETERMINATE’ -> tallrik ‘plate’


(this is a good example of “folk etymology” and “back formation”)