The Hypothetical and Theoretical Underpinnings of Non-traditional Authorship Attribution Studies: Assumptions, Presumptions, and Verifiable Constructs

Joseph Rudman

Authorship

1. Joseph Rudman

Department of English - Carnegie Mellon University

Original URL

http://www2.iath.virginia.edu/ach-allc.99/proceedings/rudman.html

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Hypothetical and Theoretical Underpinnings of
Non-traditional Authorship Attribution Studies: Assumptions, Presumptions, and
Verifiable Constructs

Joseph
Rudman
Department of English Carnegie Mellon
University
Rudman@cmphys.phys.cmu.edu

1999

University of Virginia

Charlottesville, VA

ACH/ALLC 1999

editor

encoder

Sara
A.
Schmidt

I Introduction

Some words, such as "Phrenology" or "Stylometry", insinuate their own
assumptions. In fact, nobody has ever proved that minds can be measured
by bumps or style by numbers.
Sams [1994] p. 469

In our view the protagonists of stylistic analysis in forensic
applications have not only failed to demonstrate such a link [between
style and authorship] but have not even attempted to do so.
Totty et al. [1987] p. 18

The hypothesis behind non-traditional authorship attribution studies -- those
using the computer, statistics, and stylistics -- is that every author has a
verifiably unique style. This paper points out and discusses the fact that
this hypothesis has never been empirically tested, let alone proven. The
lack of a proven theory after more than thirty years and well over 600
studies is one of the main reasons that non-traditional authorship studies
are not accepted --in the main-- by either the literary or the scientific
community.
This paper then goes on to discuss some other assumptions behind the main one
and finishes by outlining an empirical study to help move the hypothesis to
proof. The movement of this hypothesis through theory to proof is needed to
give validity to all authorship attribution studies.

II A Short History of the Hypothesis

...try to balance in your own mind the question whether the latter
[text] does not deal in longer words than the former [text]. It has
always run in my head that a little expenditure of money would settle
questions of authorship this way.... Some of these days spurious
writings will be detected by this test. Mind, I told you so.
de Morgan [1851] p. 215-216

May there not be "fingerprints" in writing, of which the author, and
most of his critics, are quite unconscious, but which could be
discovered by some new approach, to the benefit of the search for
truth?
Williams [1970] p. 2

This section outlines the history of the hypothesis that every author has a
verifiably unique style. Some of the reasons why the hypothesis was never
tested are listed with a short discussion (e.g.):
1. Computers
2. Machine readable text
3. Degree of difficulty
4. The panoply of peripheral disciplines

III What is Behind the Hypothesis: Other Sub-assumptions

Wordprinting is still in its infancy and cannot yet boast an
explanatory theory or even an agreed-upon name. Nor do its practitioners
agree on an optimal statistical model. This degree of openness...has not
prevented the convincing success of a number of important studies, which
in turn gives added intuitive plausibility to its basic
assumptions.
Reynolds [1995] p. 157

This section lists and discusses some of the sub-assumptions of the main hypothesis:
a. Style is quantifiableThat style is quantifiable is
now a given -- a fact already established. This quantifiability is
what sets the working definition of style for not only this paper,
but for most non-traditional authorship attribution studies. A short
explanation with examples of empirical studies that prove this point
is provided.
b. Style changes over timeThe problems with this
assumption are listed and discussed. Key studies on style change
over time are explicated.
c. Style is different for different genresThe problems
with this assumption are listed and discussed. Key studies on style
change over genre are explicated.
d.Style is as differentiating as (i) Fingerprints, (ii) DNA, or
(iii) Iris ScansThese assumptions differ as to the
attainable degree of certainty in any findings. This section goes on
to discuss what has been reported in the literature about the degree
of certainty and what can and should be expected.

The general problems of non-traditional authorship attribution as reported by
Rudman (Rudman, 1998) are discussed only in so far as they have first level
bearing on each sub-assumption (e.g.):
a. Which style markers to useIs the number of style
markers infinite? Is style an open ended system? (This is a
follow-up on a discussion at the Kingstown conference.)
b. Which statistical tests to useDo each of these
statistical tests need their own theoretical underpinnings? Michael
Farringdon's discussion of the criticism that, "QSUM has no
theoretical basis," is explicated.

IV An Empirical Proof

There are two strategies to making progress toward finding the
correct underlying theory, (1) the so-called "top-down" approach where
one postulates a complete theory of everything... (2) the empirically
based "bottom up" approach where one uses experimental data to make
smaller, incremental steps.
Rothstein [1998] p. 4

This section discusses the "top-down" and "bottom-up" experimental strategies
for moving the hypothesis to a correct theory and thence to studies that can
prove or disprove the theory. I have not found a "top-down" approach in the
literature -- and, understandably so, if for no other reason than
logistics.
One experimental approach to test the hypothesis, a hybrid of the "top-down"
and "bottom-up" is given here and discussed:
1. Within a time period (~ +/- 5 years), language (native), and
genre, randomly select (n1)% of all possible writers.These
constraints eliminate the need to show that a writer's style changes
over time, over genre, or language.
Randomly select (n2) passages of (n3) running words from each
selected author.
The question, "How can we be sure that (n2) is truly
representative," is discussed.
The question, "How do we know (n3) is large enough," is
discussed.

Subject each author's text to stylistic analysis.The
statement that, "This should be done using as many style markers as
possible," is explicated. A short discussion of the statistics
behind the adjudication of each style marker is presented.
Controls:
a. (n4) other writers from the same pool as (1)
b. (n5) other selections from the writers selected in
(1).
The determination of each variable "n*" is
discussed.

This type of study should be done for every non-traditional authorship
attribution study as part of the control. It is important to realize that if
this type of control is carried out for every authorship study and if it is
consistently shown that every author has a unique style, q.e.d., the
hypothesis, is proven!
A survey and critique of some important "bottom-up" studies is presented. The
importance of attacking both strategies simultaneously is discussed.

V Conclusion
One salient point made in the conclusion is that assertation is not
demonstration. Another point is that the hypothesis has already made
important steps towards theory and proof.

Bibliography

Sophia
De Morgan

Memoir of Augustus de Morgan
(By his wife Sophia de Morgan, with selections from his
letters)

London

1882

Michael
Farringdon

The Critics Answered

Jill
M.
Farringdon
(with contributions by A. Q. Morton, M. G. Farringdon and M.
D. Baker)
Analysing for Authorship

Cardiff
University of Wales Press
1996
239-261

Noel
B.
Reynolds

Statistical Wordprinting

Thomas
Hobbes

Three Discourses

Noel
B.
Reynolds

Arlene
W.
Saxonhouse

Chicago
University of Chicago Press
1995
157-162

Ira
Z.
Rothstein

The Search for a Theory of Everything

Interactions

Department of Physics, Carnegie Mellon

4
1998

Joseph
Rudman

The State of Authorship Attribution Studies: Some
Problems and Solutions

Computers and the Humanities

351-365
1998

Eric
Sams

Edmund Ironside and Stylometry

Notes and Queries

469-472
Dec. 1994

R.
N.
Totty
et al
Forensic Linguistics: The Determination of Authorship
from Habits of Style.

Journal of the Forensic Science Society

13-28
1987

C.
B.
Williams

Style and Vocabulary: Numerical Studies

London
Charles Griffin & Co., Ltd.
1970

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1999

Hosted at University of Virginia

Charlottesville, Virginia, United States

June 9, 1999 - June 13, 1999

102 works by 157 authors indexed

Conference website: http://www2.iath.virginia.edu/ach-allc.99/schedule.html

Series: ACH/ICCH (19), ALLC/EADH (26), ACH/ALLC (11)

Organizers: ACH, ALLC

The Hypothetical and Theoretical Underpinnings of Non-traditional Authorship Attribution Studies: Assumptions, Presumptions, and Verifiable Constructs

1. Joseph Rudman

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1999