University of Illinois, Urbana-Champaign
University of Illinois, Urbana-Champaign
University of Illinois, Urbana-Champaign
1. Introduction:
There exists a large number of critical writings regarding
humanities objects such as reviews, forum posts, mailing
lists and blogs. In many cases, readers do not necessarily know
the authenticity and credibility of such writings. It is desirable
to have a tool that is able to distinguish professional criticisms
from lay comments, and furthermore, to measure the
authenticity of criticisms on humanities objects. Such tools can
have many applications ranging from mass mail filtering to
customized criticism recommendation and summarization. In
a preceding study we demonstrated that a simple machine
learning model can be used to automatically differentiate
editorial critiques (i.e., those written by professional critics)
from customer critiques (i.e., those written by interested
members of the general public) (Hu et al. 2006a). In this poster,
we extend and build upon our earlier work to include a new
class of cultural objects (i.e., United States Literature) and to
uncover the set of influential features that contribute to making
“editorial” and “customer” reviews distinct.
For the sake of comparison, we use the same dataset as in (Hu
et al 2006a), namely reviews from amazon.com, the largest
online retailer of various humanities materials including books
and music. On amazon.com, many book and music objects have
both editorial reviews and customer reviews. The former are
written by editors in amazon.com, who can be seen as experts,
while the latter are written by arbitrary users from the general
public. Besides the two product categories analyzed in (Hu et
al 2006a), British Classic Literature books and Classical music
CDs, we add a third category, United States Classic Literature.
The three categories are among the most relevant to the
humanities, yet cover different media and cultures. To eliminate
possible product bias, we downloaded both editorial and
customer reviews of the same objects that were randomly
selected through the Amazon Web Services open APIs1. The
descriptive statistics are shown in Table 1. It is noteworthy that
the length of the customer reviews is highly variable. Also,
customer reviews on classical music have a smaller vocabulary
than editorial reviews.
Table 1
2. Experimental Setup:
Hu (Hu et al 2006a) demonstrated that a binary text
classifier based on a Naive Bayesian model was able
to classify editorial and customer reviews at accuracies of about
86%. Two sets of features were used in building the models.
The first one is unigrams of all original tokens including content
words, function words and punctuations, with the purpose of
preserving all stylistic clues carried by the original writings.
The second one is unigrams of all function words2 which can
carry important stylistic fingerprints (Stamatatos et al., 2002,
Argamon et al., 2003). We discovered interesting features
unique to each type of criticism. For example, professional
critiques tend to use numbers while customers tend to use terms
referring to personal experiences. However, unigram features
bear virtually no context information. Therefore in this poster,
we deepen our understanding by analyzing features with broader
context information: bigrams and trigrams of the
aforementioned two sets of tokens (Banerjee & Pedersen 2003).
For easy comparison, the results of the classification
experiments are presented in Table 2. It shows function words,
though a small token set, can capture most of the differences
of the two kinds of criticism. Comparing features with varying
depths of context, we found bigram features consistently
improve classification accuracies (to a level of 87.22% --
89.25%) while trigram features do not achieve consistent
improvement on classification accuracies. It is noteworthy that
trigrams of function words are not as good as other feature sets.
In fact, the genuine function word trigrams are too sparse in
the datasets, so here we define a function word trigram is a
sequence of 3 function words that occur within a window of 5
tokens in the text. Upon a closer examination of the classification result on British
Literature review set using trigrams of all tokens where the
mean accuracy (72.80%) significantly worsened compared to
its bigram and unigram counterparts, we found only 1.1%
editorial reviews were wrongly classified as customer reviews
but 53.1% customer reviews were misclassified as editorial
reviews. This means the model can identify features unique to
customer reviews but cannot reliably identify those unique to
editorial reviews. A possible reason for this phenomenon is the
British Literature dataset is dominated by reviews of similar
books. This is supported by the bigram and trigram feature
analyses described in next section. A large portion of the top
features in this dataset is related to Shakespeare.
3. Feature Analyses:
Knowing that editorial and customer reviews are separable
is not sufficient in and of itself; rather, we must strive
toward understanding what features make them distinct. The
binary Naive Bayesian text classifiers applied here can rank
the terms according to their relative importance in the
construction of the categorization model (Hu & Downie 2006).
Table 3 – 8 list the top 10 features from each of the six feature
sets in each review categories. 4. Discussion:
As we can see from the tables, there are features consistent
in editorial reviews across the three types of humanities
objects that distinguish editorial critiques from customer
critiques:
1. Numbers, both ordinal and cardinal, e.g., “in the twentieth”,
“than forty”, “hundredth”
2. Technical terms e.g., “songwriter”, “bassist”, “in D Major”
3. Author or artist names using diacritical characters: e.g.,
“Bronte”, “Bartok”, “Takacs"
4. Authoritative resources, e.g., “Folger Shakespeare Library”,
“CliffsNotes study guides”
5. Emphasis of the third person voice
Similarly, there are important features found in the customer
reviews that contribute to their identity:
1. Terms referring to personal experience and the first person
voice: e.g., “I found”, “I’m”, “I read”
2. Exclamation marks (“!”): from unigram to trigram, “!”
consistently appear as top features.
3. Adverbs: e.g., “definitely”, “actually”, “possibly”
4. Contractions: e.g., “I’d”, “won’t”, “you’re”, “that’s”,
“wouldn’t”,
5. Variations of artist names without diacritical characters:
e.g., “Bartok”, “Takacs”
6. Quotations (“"” in XML documents) : e.g., “. quot;
The ”, “, quot; is”
7. Colloquial phrases, e.g., “is like”, “is actually”, “have to
say
8. Nonstandard words and marks: e.g., “_”, “cd”, “cds”
Most of the observed differences seem reasonable. Experts like
to have more accurate descriptions by using numbers, citing
authoritative resources, etc. While experts write in a more
objective manner by using a third person voice, ordinary readers
tend to connect humanities objects with their own personal
experiences and prefer to express their emotions. Experts also
use many technical terms while ordinary readers tend to use
informal writing styles such as spoken language, contractions
and nonstandard marks. Both experts and common readers refer
to authors or artists, but very few readers bother using proper
diacritical characters, instead opting to use basic Latin letters.
It is interesting to see that common readers use more quotations,
adverbs and punctuations than experts.
5. Conclusions and Future Work:
We extended our previous work on classifying
editorial/professional reviews and customer reviews
on humanities objects, and then examined the influential
features in each of the review categories. Particularly, we
examined two feature sets, “all tokens” and “function words
only”, with context depths ranging from unigrams to trigrams.
Results show that the two kinds of reviews are distinct. By
using the NB feature ranking method, we found interesting and
unique features associated with each type of review. Such
features are important in enriching digital humanities
repositories and facilitating criticism filtering and
recommendation. The feature analyses also disclose new
questions such as how criticism on literatures from various
countries (e.g. British vs. U.S.) differs among one another. This
will be part of our future work. We will also examine other
critical writing resources such as mailing lists and forums for
similar patterns between “expert” and “lay” contributors.
1. aws.amazon.com
2. The list of function words was edited by the Laboratory of
Linguistic Cognition at Illinois Institute of Technology. It is
available at <http://shekel.jct.ac.il/~argamo
n/gender-style/function-words.txt>
Bibliography
Argamon, Shlomo, Moshe Koppel, Jonathan Fine, and Anat
Rachel Shimoni. "Gender, Genre and Writing Style in Formal
Written Texts." Text 23.3 (2003): 321–346.
Banerjee, Satanjeev, and Ted Pedersen. "The Design,
Implementation, and Use of the Ngram Statistic Package."
Proceedings of the Fourth International Conference on
Intelligent Text Processing and Computational Linguistics,
Feb. 2003. Mexico City, Mexico, 2003.
Hu, Xiao, and J. Stephen Downie. "Stylistics in Customer
Reviews of Cultural Objects,." Proceedings of the 2nd SIGIR 2006 Stylistics for Text Retrieval Workshop, Aug. 2006, Seattle,
Washington. 2006.
Hu, Xiao, J. Stephen Downie, and Jin Ha Lee. "Stylistic
Analysis on Reviews of Humanities Objects." Poster presented
at the Chicago Colloquium on Digital Humanities and Computer
Science, Nov. 2006, Chicago, Illinois. 2006a. <http://dh
cs.uchicago.edu/abstracts/hu.pdf>
Hu, Xiao, J. Stephen Downie, and M. Cameron Jones.
"Criticism Mining: Text Mining Experiments on Book, Movie
and Music Reviews." Digital Humanities 2006 Conference
Abstracts. Paris: CATI, Université Paris-Sorbonne, 2006b.
88-93.
Stamatatos, E., N. Fakotakis, and G. Kokkinakis. "Text Genre
Detection Using Common Word Frequencies." Proceedings of
18th International Conference on Computational Linguistics,
July 2000, Saarbrücken, Germany. 2000.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at University of Illinois, Urbana-Champaign
Urbana-Champaign, Illinois, United States
June 2, 2007 - June 8, 2007
106 works by 213 authors indexed
Conference website: http://www.digitalhumanities.org/dh2007/
References: http://web.archive.org/web/20070810143343/http://digitalhumanities.org/dh2007/DH2007.detail.html http://web.archive.org/web/20080703194728/http://www.digitalhumanities.org/dh2007/abstracts/titles.xq
Series: ADHO (2)
Organizers: ADHO