If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Alec Killoran - Text Analysis 1

Page history last edited by Alec Killoran 9 years ago

I decided to run Thomas Hardy's "The Mayor of Casterbridge" through Lexos. I tried Antconc briefly, but found it rather clunky. I considered this particular text to be a good candidate for data analysis due to its rather Victorian nature. Very little of the novel's text pertains to plot, and so I thought it might be interesting to see if a few words rise to the surface of the text.

I used the above settings and went with the Fox 1992 stop word list instead of the SMART 1971 list. The results it yielded were acceptable. I added a couple additional stopwords including the main characters' specific names, which helped clear up the resulting word cloud:

I found this word cloud to be extremely productive. I think the words coalesce to really highlight a central focus in the novel. Town, door, eyes, woman, lady, sir, life, time, days. Those words jumped out at me as seeming particularly related. They evoke an image of a Victorian town where time passes slowly by, as the townsfolk carefully watch each other in a place of frozen tradition. Just from this word bank and a vague knowledge of the text's historical context, I can assume that the novel takes place almost entirely within the bounds of a town, and often within a single home in that town. I can also sense that the novel is fairly decrepit in tone. Small additions to the word cloud like "ago, lost, little" augment my viewing of the other more prominent words. Indeed, the cloud would seem different if those words were replaced by "now, thriving, robust".

I tried a number of other formats with the same scrubbing rules and different sets, but none offered such a unified picture as the one above. I think that, in a fitting sort of way, this type of analysis must be done in a brute force and catch-all method. Just like these tools can look at so many different texts at once, so too must many different filters be tried in order to find a result worth anything of substance. I do not believe anything productive would have come of the "BubbleViz" of the same scrubbing options. I think the important lesson to take from this is that even these visual representations of data are prone to human error in interpretation. Although the BubbleViz highlighted the same words as the word cloud, it did so in such a way that my brain was unable to group them together into a cohesive concept that I could grasp. Perhaps the reverse would be true with a different text. It seems important that in future text analyses, our class ought not to grow weary of an often unproductive visual representation tool. Each tool ought to be checked with each different set of filters, in the hopes that one of them will excite our brains to a particular conclusion.