Standards, the Standards-Making Process, and their Relevance to Stylometry

paper, specified "long paper"
  1. 1. Patrick Juola

    Duquesne University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The stakes have never been higher for stylometry applications and research. In addition to investigating academic literary and historical questions (Burrows, 2003), stylometrists are increasingly called upon to provide forensic evidence. Stylometry is being employed to resolve legal issues such as murder (Chaski, 2007; Grant, 2012), fraud (McMenamin, 2011), and asylum applications (Juola, 2014). Judges and juries need straightforward and understandable answers to questions such as “did the defendant write this email?” “is this will forged?” or “is this suicide note genuine?” (Chaski, 2007; Ainsworth & Juola, 2018) With questions of justice, substantial financial outcomes, and individual safety hanging in the balance, the need for accuracy in stylistic analysis is crucial.

Accuracy in forensic science has been recognized as a “crisis” (National Research Council, 2009) in that much of it relies on "questionable or questioned science" (National Research Council, 2009) with little empirical support. For example, forensic odontology (dentistry) simply doesn't work. (PCAST, p. 3; Pilkington, 2022) The (US) President's Council of Advisors on Science and Technology (PCAST) discussed both “foundational validity” and “validity as applied” as absolute requirements for forensic evidence, and further focused on the need for standards of practice to evaluate whether these requirements have been met. The American Academy of Forensic Sciences (AAFS) Standards Board was established in 2015 to provide ``high quality science-based consensus forensic standards’’ in a variety of disciplines, including forensic document examination. While the (US) government does not typically set standards or mandate their use, cleaving to standards can enhance reliability, credibility, and transparency of forensic evidence.

As peer reviewers, the stylometric community is used to evaluating the validity of individual stylometric analyses on a paper-by-paper basis. We are familiar with questionable practices that may produce inaccurate or untrustworthy results, and can recommend changes for better outcomes. For instance, machine learning methods can easily “overfit” the training data at the expense of accuracy on the actual data of interest, hence the need for validation on representative data prior to analysis. At the same time, we recognize that scholarly disciplines are continually changing and the best practices of twenty years ago, while still good practices, may have been overtaken by new and improved practices. For example, improved classification methods such as deep learning may outperform simple feature comparison methods such as Burrow’s Delta (2003), but at the same time may require an impractical amount of training data for real-world problems, and may also be too confusing to explain to a judge and a jury.

However familiar these points are to DH practitioners, legal experts cannot be expected to know or understand them. Formal standards and best practices can provide guidance to the general public in recognizing and excluding clearly unacceptable work.

This paper discusses the standards-making process, including the language of standards, the creation and publication process, and the role of standards in interpreting forensic evidence, in order to promote discussion of accountability and accuracy in high-stakes application of stylometry. We specifically highlight the work of Rudman (2005; 2012) and Juola (2015) on stylometric accuracy and the handling of documents to maximize accuracy. We address the nature of unreliable analyses and appropriate methods to exclude less dependable techniques in situations with profound legal, financial, and human rights implications, while still allowing for scholarly research and exploration. The consequences of bad stylometry are significant; it would be valuable to draw up a list of guard rails and red lines that can mark an analysis as untrustworthy and therefore not to be accepted or relied upon.


Burrows, John.
"Questions of authorship: attribution and beyond: a lecture delivered on the occasion of the Roberto Busa Award ACH-ALLC 2001, New York." 
Computers and the Humanities
 37, no. 1 (2003): 5-32.

Chaski, Carole.
"The keyboard dilemma and authorship identification." In 
IFIP International Conference on Digital Forensics
, pp. 133-146. Springer, New York, NY, 2007.

Tim. "TXT 4N6: method, consistency, and distinctiveness in the analysis of SMS text messages." 
Journal of Law & Policy
 21 (2012): 467-494.

McMenamin, Gerald.
"Declaration of Gerald McMenamin."
Ceglia v. Zuckerberg and Facebook, WD 2012 WL 1392965
(W.D.N.Y). (2012). Available online at

Juola, Patrick.
, Stylometry and Immigration: A Case Study,
Journal of Law & Policy
 21 (2012):

Ainsworth, Janet, and Patrick Juola.
"Who wrote this: Modern forensic authorship analysis as a model for valid forensic science." 
Washington University Law Review
96 (2018): 1159.

National Research Council.

Strengthening Forensic Science in the United States: A Path Forward
. Washington, DC:National Academies Press, 2009.

President’s Council of Advisors on Science and Technology
REPORT TO THE PRESIDENT Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods
, Washington DC, 2016.

Pilkington, Ed.
"A bite mark, a forensic dentist, a murder: How junk science ruins innocent lives." 
The Guardian, 28 April 2022
. Available online at

Rudman, Joseph.
"Unediting, de-editing, and editing in nontraditional authorship attribution studies: With an emphasis on the canon of Daniel Defoe." 
The Papers of the Bibliographical Society of America
 99, no. 1 (2005): 5-36.

Rudman, Joseph.
"The State of Non-Traditional Authorship Attribution Studies—2012: Some Problems and Solutions." 
English Studies
 93, no. 3 (2012): 259-274.

Juola, Patrick.
"The Rowling case: A proposed standard analytic protocol for authorship questions." 
Digital Scholarship in the Humanities
 30, no. suppl_1 (2015): i100-i113.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO