| 
View
 

Our Corpora

This version was saved 9 years, 7 months ago View current version     Page history
Saved by Alan Liu
on June 22, 2015 at 8:12:34 pm
 

 

English 197 Corpora Folder on Google Drive

 


Works in our children's literature corpus came from Project Gutenberg's "Children's Bookshelf" category.  We drew works of fiction (mostly novels, but including some short fiction) from all the subcategories on that bookshelf, constraining our selection to works published in the 1880s.  Works in our adult fiction corpus came from the corpus of 2,731 nineteenth-century British novels given to us by the Stanford Literary Lab (originally gathered by the Lab from the Internet Archive and Project Gutenberg). (Thanks to Ryan Heuser of the Stanford Literary Lab.) We constrained our selection to male and female authored novels of the 1880s.

 

Below are links to zip files on our course Google Drive that contain our corpora and sub-corpora.  These include the plain-text files for full works.  Other zip files in our Google Drive folder contain works that have been "cleaned" (we used the Lexos "scrubber" and Matthew Jockers's stoplist) and also cleaned-and-"chunked" (we used the Lexos chunker to break files for topic modeling into 1,000-word segments).

 

Adult British Fiction - 1880s (451 works) (metadata spreadsheet)


Children's Fiction - 1880s (135 works) (metadata spreadsheet)


 

 


Special thanks to class members Lindsay Blackie, Alec Killoran, and Aaron Woldhagen for assisting with the assembly of the corpora.  Thanks to the Stanford Literary Lab for sharing the larger corpus of British nineteenth-century fiction from which the course drew its "Adult British Fiction - 1880s" corpus.

 

Comments (0)

You don't have permission to comment on this page.