The State of Authorship Attribution Studies: (1) The History and the Scope; (2) The Problems -- Towards Credibility and Validity.
Carnegie Mellon University
David I. Holmes
University of the West of England
Fiona J. Tweedie
University of Glasgow, United Kingdom
R. Harald Baayen
Max Planck Institute for Psycholinguistics
Keywords: authorship attribution, stylistics, statistics
There are many serious problems with the science of authorship attribution studies. This session proposes to look at the history of the field, identify many of the more major problems, and offer some solutions that will go a long way towards giving the field credibility and validity.
Willard McCarty's recent posting on "Humanist" (Vol. 10, No. 137) "Communication and Memory" points out one of these problems, "...scholarship in the field is significantly inhibited, I would argue, by the low degree to which previous work in humanities computing and current work in related fields is known and recognized."
A major indication that there are problems in a field is when there is no consensus as to correct methodology or technique. Every area of authorship attribution studies has this problem -- research, experimental set-up, linguistic methods, statistical methods....
It seems that for every paper announcing an authorship attribution method that "works" or a variation of one of these methods, there is a counter paper pointing out crucial flaws:
Donald McNeil points out that scientists disagree as to Zipf's law;
Christian Delcourt raises objections against current practice in co-occurrence analysis;
Portnoy and Petersen show errors in Radday and Wickmann's use of the correlation coefficient, chi-squared test, and t-test;
Hilton and Holmes showed problems in Morton's QSUM techniques;
Smith raised many objections against Morton's early methods;
There is Merriam vs Smith;
There is Foster vs Elliott and Valenza.
This widespread disagreement has not only kept authorship attribution studies out of most United States court proceedings, but it also threatens to undermine even the legitimate studies in the court of public and professional opinion.
The time has come to sit back, review, digest, and then present a theoretical framework to guide future authorship attribution studies.
The first paper, by David Holmes, will give the necessary history, scope, and present direction of authorship attribution studies with particular emphasis on recent trends.
The second paper, by Harald Baayen and Fiona Tweedie, will focus on one problem: the use of so-called constants in authorship attribution questions.
The third paper, by Joseph Rudman, will point out some of the problems that are keeping authorship attribution studies from being universally accepted and will offer suggestions on how these problems can be overcome.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Queen's University
Kingston, Ontario, Canada
June 3, 1997 - June 7, 1997
76 works by 119 authors indexed
Conference website: https://web.archive.org/web/20010105065100/http://www.cs.queensu.ca/achallc97/