For my project, I would like to study a large data set of publically available fanfiction and analyze it for stylistic features over a number of categories, including genre. I would also like to see if I can build a classifier to take unrecognized fanfiction and identify its genre.
I plan to use data publically available on the internet, focusing on popular fanfiction sharing sites like ao3 (Archive of Our Own). I will write code to mine these websites, focusing on neatly categorizing stories and finding approximately even amounts of data in each category. I expect this web-scraping and cleaning of the data it results in to make up a large portion of my project.
I would like to come up with several stylistic points to study and compare them by genres. I would also like to build a classifier to categorize unseen fanfiction into genres. I hope to study how much fanfiction varies by genre and use statistical tests to see variation through several categories. Once I've worked out my scraping, I will have a better idea how much I can rely on the built-in tag system on the websites I'll be using, but ideally I would be able to use those tags in my analysis.
If my classifier is a success, I would like to show that as part of my presentation. I will probably also use graphs and charts to show stylistic variation throughout the dataset. I will also likely present the levels of variation across different categories, and which were most and least helpful to me when building my classifier.