I would like to propose a session devoted to what I call the Ad-Hoc Authorship Attribution Competition, to be held as part of the 2004 Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities (ALLC/ACH 2004). My hope is to establish a collection of the best techniques and methods in inferring document authorship from participants around the world.
Recent years have seen a tremendous increase in interest in the problem of determining the author of a disputed or unknown document using computer-aided or even complete computer-driven analysis. This interest takes a variety of forms and comes from a wide range of disciplines. Examples of this include the recent special issue of the journal Chance (a specialist journal in the field of statistics) as well as the recent attention (from "humanities computing") at ACH/ALLC 2001.
This interest has not, however, resulted in an accepted standard of practice or an accepted set of techniques usable by interested non-specialists. Various methods have been independently tested and validated, but the (positive) results have not been obtained in a framework that lets researchers compare accuracy and decide which method is suitable for their problem.
To address this, I propose a competitive evaluation framework, entitled an "Ad-hoc Authorship Attribution Competition," in which researchers will jointly and separately analyze a shared, proposed- benchmark corpus using their individually developed techniques. We hope to bring together the users and the developers of this technology to share and to compare their methods and results. This competition will help to create a set of ``best practices'' in authorship attribution that can standardize analyses and spur the development of new and improved methods. Furthermore, the software and methods will be made available to a wider set of researcher who may not have the skill to develop an analysis on their own, but who can use special-purpose software developed to this end.
The competition will be run using a set of specially developed corpora (of various sorts) that will be distributed ``anonymously'' to participating researchers. Some documents will be taken from public sources, while others will be documents specifically collected for this purpose and previously unanalysed. Researchers will be asked to submit their programs (or their analyses of the documents) and determine who wrote each individual document. In cases where the methods that the researchers use don't easily map onto a simple, standalone program, technical support for developing, testing, and standardizing software will be available from the Digital Humanities Developer's Consortium. The DHDC will help in the production of high quality, end-user friendly software to encourage use and reuse of the methods presented.
The results will be tabulated and presented to be presented during the proposed session. In addition to speakers on the contest structure and format (myself) and on the support role of the DHDC (presented by Stephen Ramsay, of the University of Georgia), selected contest entrants will be invited to make brief presentations about their methods and approaches. In addition, it is hoped to publish (independently) an edited volume of papers and software describing the various methods. Participants in the competition will be invited to submit to this volume.
Response to the proposed contest has been substantial; at latest count more than fifteen individual researchers and research groups across the world have indicated interest and intent in participating. The schools and organizations represented by this group include : Trinity College (Dublin), Sheffield University, University of Rome, University of Birmingham, University of Eastern Piedmont, University of Massachusetts (Amherst), the Johns Hopkins University, University of Patras, University of Singapore, and Massachusetts Institute of Technology. Registration remains open, and letters of interest/intent continue to trickle in.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Göteborg University (Gothenburg)
June 11, 2004 - June 16, 2004
105 works by 152 authors indexed
Conference website: http://web.archive.org/web/20040815075341/http://www.hum.gu.se/allcach2004/