Though I feel sure that someone somewhere is up to something like this, I'd like to record the idea before I forget it:

So, you want to get a sense of the extent to which a certain word or notion or feeling or whatever is present or pervasive in a given work. You want to know, perhaps, how pervasive stones are in the Comedy.

1. Thesaurus "stone," preferably in a reference work that will give you adjectives like "stony" and verbs like "petrify."
2. Take all of those words and use wiktionary or another work to derive all of their conjugations or declensions and other changes.
3. Search your text for all of these terms to determine what proportion of the words* (or even letters) in the work are related to your original term.

Note: * You might want to subtract all function words from consideration as part of the total length of the text.

I was also interested in doing this or something like it with Indo-European roots: take a root, find all the words in your modern language that descend from it, conjugate/decline them all, and then search the work.

You could also do permutations where you feed certain thesaurus results back into the thesaurus to branch out more, or even try some complex rules to say which of the words the thesaurus gives will be accepted. For instance, among the results for "stone" might be "pebble" and "masonry." If you run them back into the thesaurus, you'd probably find that "pebble" shares more synonyms with "stone" than does "masonry."

It's obviously a very inexact science, but I think it's a neat idea and I've never heard of anyone doing it.

With such an index in place, you could also track the presence of a word/concept/figure over the course of the text. For instance, in Macbeth, the presence of "blood" probably grows over the course of the play. The potential to visualize this sort of thing in literature strikes me as super-sweet.

Advertisements