Online Readership and Perceptions of Genres Over Time

paper, specified "long paper"
  1. 1. Maria Antoniak

    Cornell University

  2. 2. Melanie Walsh

    University of Washington

  3. 3. David Mimno

    Cornell University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

When today’s readers think about “science fiction,” what kinds of books do they think of? Do they think of Ursula K. Le Guin’s 1969 novel
The Left Hand of Darkness, or Neal Stephenson’s 1992
Snow Crash, or Hugh Howey’s 2011
Silo series? Another way of phrasing this question might be: When today’s readers think about “science fiction,” what
era of books do they think of? Do they think of the 1960s, the 1990s, the 2010s, or all of the above? What about “romance,” “fantasy,” “classics,” and “vampires”? What historical eras do readers imagine when they think about these categories, and how do they compare to science fiction?

Data from online reading communities like Goodreads and LibraryThing — where readers rate, review, and categorize books — enables us to answer these questions and to explore the relationship between genre and historical period in the minds of readers. We specifically examine reception data from LibraryThing, where users can add any number of free-text tags to any book and where all of this data is publicly available (unlike Goodreads, where review data is mostly hidden (Walsh and Antoniak 2021)) and we combine it with book publication information in order to better understand which eras of books readers are categorizing with which tags. For example, it turns out that the median publication date for books tagged as “science fiction” is the year 1989, which is significantly earlier than the median publication date for the tag “vampires,” the year 2007. This information can help us identify emergent genres as well as understand how readers perceive genres historically. 
This work builds on recent studies of online reading communities that take advantage of the scale and diversity of online data to examine readers’ preferences (Manshel et al., 2019; Bourier and Thelwall, 2020; English et al., 2021). In particular, our work adds to explorations of online readers’ perceptions of genres (Hegel, 2018; Antoniak et al., 2021; Walsh and Antoniak, 2021), which have used natural language processing methods on reviews to measure affinities and differences between user-applied tags. By focusing on publication dates, we contribute a new layer in understanding how today’s readers perceive literary genres.

Comparing the Top 20 Most Popular Tags By Publication Date
We begin by examining the distribution of publication years for the 20 most popular tags. From the set of most popularly used tags on LibraryThing, we select the 20 most popular tags that are not meta (e.g.,
to-read) or repetitive (e.g.,
classic when already including
classics), which ensures that the selected tags more closely resemble genre or subject labels.  For each of these 20 tags, we scrape the metadata of the 1,000 books most often assigned that tag. This metadata includes the book’s title, author, original publication year, user ratings, user reviews, and the full set of tags that users have applied to this book. Of the tags assigned to any book, we include only those assigned by at least 10 users. 

The resulting distributions suggest that users perceive certain tags as belonging to earlier historical periods than others. For example, the median publication date for
classics (1900),
children (1971), and
picture book (1989) are all decades earlier than
graphic novel (2006),
vampires (2007), or
young adult (2005). This contrast likely points to the fact that
graphic novel,
vampires, and
young adult are more recent, emergent genres. The publication distribution for
horror, a genre that sometimes includes characters who are vampires, is also much earlier and wider than
vampires, again suggesting that
vampires is its own distinct, historically-specific genre. Unlike vampires, the tags
mystery and
science fiction both have wide publication distributions that begin in the early 20th century. 

Exploring Each Tag By Co-Occurring Tags and Publication Date 
We then consider each of the most popular 20 tags in turn, examining each tag’s 40 most commonly
co-occurring tags and their corresponding publication date distributions. This view allows us to see at an even more granular level which aspects of a tag might contribute to its perceived relationship with different historical eras. For example,
mystery books that receive the tag
USA are, on average, newer than
mystery books that receive the tags
England or
British. Similarly,
mystery books that receive the
hardcover tag are typically newer than
mystery books that receive the
paperback tag, likely because new books are published in hardcover before they are published in paperback. These co-occurring tags and publication date distributions give us a more multidimensional understanding of how readers use the tag

We also examine the difference between the median publication year for a tag over all the books and the median publication year for only those books most often assigned to one of our 20 target tags. For example, for the books most often receiving the tag
horror, the co-occurring tag
20th century is assigned to books with more recent publication years compared to the full set of books receiving the
20th century tag. But when
horror books receive the
vampires or
werewolves tags, these tend to have older median publication dates than they normally would. The
horror genre, according to LibraryThing users, consists of newer books tagged
20th century,
American literature, and
short stories as well as  older books tagged
vampires, and
supernatural books, in comparison to the other genres.

We have shown that tags and book publication dates can be combined to give us multidimensional views of how readers use tags in online reading communities. Of course readers’ use of tags and their perceptions of genre are shaped by a multitude of economic and sociological factors (McGrath, 2020) that we do not address here. And we believe that considering these factors and attendant critical scholarship would help draw out the significance of our findings even more. While we leave this specific synthesis to future work, we conclude with the claim that reception data like tags and data-driven research like our study can contribute to ongoing conversations about literary genre and help illuminate how contemporary readers think about and interact with books.

Antoniak, M., Walsh, M., and D. Mimno. (2021). "Tags, Borders, and Catalogs: Social Re-Working of Genre on LibraryThing." Proceedings of the ACM on Human-Computer Interaction 5. CSCW (2021): 1-29.

Bourrier, K., and Thelwall, M. (2020). "The social lives of books: Reading Victorian literature on Goodreads." Journal of Cultural Analytics 1.1: 12049.

English, J. F., Enderle, S., and Dhakecha, R. (2018). "Mining Goodreads: Literary Reception Studies at Scale,"
, accessed December 10, 2021.

Hegel, A. (2018). Social Reading in the Digital Age. University of California, Los Angeles.

McGrath, L. (2020). "America’s Next Top Novel." Post45.

Walsh, M., and Antoniak, M. (2021). "The Goodreads “Classics”: A Computational Study of Readers, Amazon, and Crowdsourced Amateur Criticism." Journal of Cultural Analytics 4: 243-260.

Manshel, A., McGrath, L.B., and Porter, J. D. (2019). "Who Cares about Literary Prizes?." Public Books 3.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO