Loops and self-reference in dictionaries

One of the things I wanted to use this blog for was to publish (and get incentive to keep writing) the summaries that I really should write as soon as I read an article.

Today I read this article:

Levary, David, Jean-Pierre Eckmann, Elisha Moses, and Tsvi Tlusty. 2012. Loops and Self-Reference in the Construction of Dictionaries. Physical Review X, 2, 031018.

It reminded me of my Magister thesis, where I (among other things) looked at the dictionary network around the two synonyms skarp and vass in Swedish. The result was not a surprise – we know that dictionary entries refer to each other in loops, making it very difficult to understand a new concept, without understanding related concepts.

The Levary et al (2012) article did some cool statistics on these self-refential loops in the dictionaries. One such loop is in figure 5 below


Figure 5 has examples of strongly connected components (SCCs)– where all nodes (words) point to all other words – such as precipitation, sleet and snow.

Previously, other researchers had found on that in a typical dictionary:
“It was found that dictionaries consist of a set of words, roughly 10% the size of the original dictionary, from which all other words can be defined. This subgraph was observed to be highly interconnected, with a central, strongly connected component, dubbed the core.”

This should mean that if you know this core, you can use it to define all other words (now, your sentences would be awkward and long, with red or green fruit on bough of tree that you can eat and that you can make sauce and pie out of, instead of apple – and, of course, the definition you might end up with from the dictionary might not uniquely identify an APPLE at all – in this case, maybe grape would also be fitting). It would also, on the surface, mean that to understand the core, you have to understand basically all of it. Hmm…

Levary et al (2012) discovered that if you concentrate only on SCCs with loops less than or equal to 5 words), this core decomposes into several hundred SCCs, that are independent of each other. Here are some examples of SCCs:

{emotion, spirit, dejection, melancholy, feeling}
{height, end, dimension, length}
{bark, trunk, tree, lumber}

And then the authors looked at _when_ words had first been recorded. And they found that the words in SCCs (such as the ones above) tended to appear closer together in time than average words do. So once you start talking about a semantic field (maybe you move from a desert to a place with trees) lots of interdependent words appear in a relatively (150 years or so) short time span. The words co-eolve. This is cool. You don’t need to use trunk to define what a tree word mean, but conceptually bark and tree seem to be close.