'DNA' and Non-traditional Authorship Attribution: An Inclusive Model

paper
Authorship
  1. 1. Joseph Rudman

    Department of English - Carnegie Mellon University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


'DNA' and Non-traditional Authorship Attribution: An
Inclusive Model

Joseph
Rudman

Carnegie Mellon University
jr20@andrew.cmu.edu

2002

University of Tübingen

Tübingen

ALLC/ACH 2002

editor

Harald
Fuchs

encoder

Sara
A.
Schmidt

Anything a person writes contains the code of his intellectual
DNA, or whatever you want to call it.
Webb 1994

The greater the number of features and the more the features
belong to different categories (e.g., syntactic structures, type of
grammatical subject, inflexions, vocabulary, spelling, and so on)
the stronger the case for shared authorship.
Eagleson 1989

INTRODUCTION:
For many years it has been obvious from the literature that most
non-traditional authorship attribution studies using one or some other small
number of style markers do not carry the weight of scientific validity with
either the majority of other authorship attribution practitioners, the
specialists in the field of the study, or the general public. (In addition
to Eagleson, see Banks and Rudman -- also, Rudman 1998)
During a talk on the "Style-Marker Mapping Project" at the ALLC-ACH 2000
conference in Glasgow, I mentioned, in passing, an attribution model based
on a "DNA" concept. (Rudman 2000) It was illustrative and not "on topic."
However, the audience picked up on this and some of the ensuing questioning
and discussion kept trying to move away from the Style-marker Mapping
Project.
This paper presents a non-traditional authorship attribution model based on a
"DNA" analogy. This paper emphasizes that it is only an analogy -- a
framework to explain the techniques of the "Inclusive Model" -- there are
obvious fundamental differences between DNA and style.
Because some of the terms in this paper could be unfamiliar to the expected
audience, a clear and concise definition is given the first time each such
term is used.

I. BACKGROUND AND DEFINITIONS
If we look at style as a living organism, style-markers are its genetic
material -- making the Style-Marker Mapping Project (Rudman, 2000) analogous
to the human genome project. I would like to extend this biology analogy:
The Inclusive Authorship Attribution Model being analogous to the DNA
analysis.
The earliest reference to DNA and style that I have seen is Bailey's
comparison of the tools used to decode the underlying makeup of the two --
X-ray diffraction for DNA, the computer for style. Bailey does not move
towards a DNA model for stylistics. (Bailey)
The lead quote by Webb also is quoted in Forsyth's dissertation. Yet Forsyth
does not use the intent of the quote to move into a DNA model. (Forsyth)
I have been leaning towards a more inclusive attribution model that would
utilize a large number of style-markers since the mid 1980's. Other
researchers also have recognized the need to expand the number of style
markers in attribution studies. As the DNA structure became decoded and the
comparison methods refined, it became the analogous model of choice. I first
mentioned the model at the ALLC-ACH Oxford conference in 1992. (Banks and
Rudman) The thrust of that presentation was towards a statistical method of
combining the results of different statistical results on various
style-markers. This section briefly traces the evolution of the DNA model
through various publications and presentations.
Clear and concise definitions of the DNA autoradiogram are given. (Kirby) A
brief explanation of why this model is necessary closes this section.
(Willing)

II. THE MODEL

Outline a method of analysis which will allow organization of these
features [the entire range of linguistic features] so as to facilitate
comparison of any one use of language with any other
(Carter, Crystal and Davy, and Darbyshire). McMenamin 1993

A) How the Inclusive Model differs from other models (e.g.
multivariate models and Burrows' Delta Project). (Holmes, Burrows)
B) The DNA Analogy is Explicated.It is shown how each locus of
the autoradiogram is equivalent to a different style-marker. The
determination of each style-marker locus is discussed.Forsyth's
suggestion at the Glasgow conference that a list of "proven"
style-markers should be provided and used is discussed.
C) Visual RepresentationA Method of visual representation of the
results of the model is shown.
D) The following two statistical methods of combining each
style-marker locus into a final answer are presented and discussed:
(1) If the style-markers that are used can be shown to be
independent of one another (e.g. word length distribution,
percentage of nouns starting sentences, type/token ratio) a
procedure based on Fisher's method for combining significance
probabilities from independent statistical tests can be used.
(Fisher)
(2) If the style-markers that are used are not independent of
each other (e.g. word length distribution, word length
correlation, percentage of latinate words) the statistical
method employed by DNA researchers can be used.

CONCLUSION
The method of determining the DNA loci and style-marker loci are different. A
single technique is employed to determine all of the DNI loci. Each
style-marker locus is determined, for the most part, by different
experimental techniques. And some of the style-marker loci are actually the
result of multivariate statistical analysis.
The Inclusive Authorship Attribution Model promises a degree of acceptability
not seen in most non-traditional attribution studies -- especially in types
of studies such as McMenamin's, "`Population Model' where there are no
obvious authorship candidates, and texts from an entire population of
possible authors are considered against texts by one suspected author."
(McMenamin)

Preliminary Bibliography

Richard
W.
Bailey

The Future of Computational Stylistics

ALLC Bulletin

7

4-11
1979

[First presented at the Association for Literary and Linguistic
Computing Fifth International Meeting, Friday, December 15, 1978, King's
College, University of London. Also in LITERARY
COMPUTING AND LITERARY CRITICISM. Ed. Rosanne G. Potter.
Philadelphia: University of Pennsylvania Press, 1989, 3-12.]

David
J.
Balding

Peter
Donnelly

Inference in Forensic Identification

JOURNAL OF THE ROYAL STATISTICAL SOCIETY A

158
[Part 1.]
21-53
1995

David
L.
Banks

Joseph
Rudman

Questionable Attribution in the Canon of Daniel Defoe:
A Study of Techniques

ALLC-ACH'92 Conference. Oxford University, April 7,
1992

1992

John
Burrows

Questions of Authorship: Attribution and Beyond. A
Lecture Delivered on the Occasion of the Roberto Busa Award

ACH-ALLC01 Conference. New York University, New York,
June 14, 2001

2001

Robert
D.
Eagleson

Linguist for the Prosecution

Geraldine
Barnes
et al
WORDS AND WORDSMITHS

Sydney
The University of Sydney Press
1989
22-31

R.
A.
Fisher

STATISTICAL METHODS FOR RESEARCH WORKERS

London
Hafner
1969

Richard
S.
Forsyth

Stylistic Structures: A Computational Approach to Text
Classification

Dissertation

University of Nottingham
1995

David
I.
Holmes

Authorship Attribution and the Book of Morman: A Case
Study in Stylometric Techniques

Ph.D Thesis

University of London, Kings College
May 1990

David
I.
Holmes

Vocabulary Richness and the Prophetic Voice

(A supplement to the main thesis.) Ph.D Thesis

University of London, Kings College
November 1990

Lorne
T.
Kirby

DNA FINGERPRINTING: AN INTRODUCTION

New York
W. H. Freeman
1992

Gerald
R.
McMenamin

FORENSIC STYLISTICS

Amsterdam
Elsevier
1993

Joseph
Rudman

The Style-marker Mapping Project: A Rational and
Progress Report

ALLC/ACH 2000 Conference, University of Glasgow,
Scotland, July 25, 2000

2000

Joseph
Rudman

The State of Authorship Attribution Studies: Some
Problems and Solutions

COMPUTERS AND THE HUMANITIES

31
4
351-365
1997

Charles
Webb

Interview in

THE INDEPENDENT MAGAZINE

35
5 February 1994

[Quoted by Forsyth, 8.]

Richard
Willing

Mismatch Calls DNA Tests Into Question

USA TODAY

3A
8 February 2000

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2002
"New Directions in Humanities Computing"

Hosted at Universität Tübingen (University of Tubingen / Tuebingen)

Tübingen, Germany

July 23, 2002 - July 28, 2008

72 works by 136 authors indexed

Affiliations need to be double-checked.

Conference website: http://web.archive.org/web/20041117094331/http://www.uni-tuebingen.de/allcach2002/

Series: ALLC/EADH (29), ACH/ICCH (22), ACH/ALLC (14)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None