• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Social distancing? Try a better way to work remotely on your online files. Dokkio, a new product from PBworks, can help your team find, organize, and collaborate on your Drive, Gmail, Dropbox, Box, and Slack files. Sign up for free.


Alec Killoran - Topic Modeling

Page history last edited by Alec Killoran 5 years, 1 month ago

I used Nathaniel Hawthorne's House of the Seven Gables for my topic modeling.  After downloading the topic modeling tool, I uploaded the text file and noticed that it was impossible to select more than one file.  Rather, it is possible to select a prearranged directory that contains (presumably) a number of related text files.  I think this is a good initial step in our class project--filing relevant works or whole corpora into a shared directory.  I made a small edit to the number of iterations run by the topic modeling tool, figuring more would bring back a more refined result.




The result was a fairly convoluted list of topics, and I considered upping the number of iterations while reducing the number of topic words printed.


The result was much more satisfying.  The topics found by the tool are unified and a few stand out as obvious successes ("parlor heard turned secret corner," "made small life longer quiet," "clifford face found purpose knew" all spring to mind).


I think topic modeling would be a great place to start our class project.  Though my own idea for the project is as yet untested and extremely broad, I think it would be great to use topic modeling on two corpora of texts separated by some decided number of years.  After establishing the topics of the texts in the two time periods, maybe our class can do text analysis on the topics to see which words/topics transcend two time periods, and which ones were unique to their respective corpora.

Comments (0)

You don't have permission to comment on this page.