Wittgenstein's Nachlass consists of around 20.000 pages of manuscripts and typescripts, less then half of which have been published so far. The existing publications of the Nachlass have been prepared according to different editorial principles and some which consist of selections from different manuscripts lack detailed documentation of the sources.
Since 1990 the Wittgenstein Archives at the University of Bergen has been transcribing and encoding the Nachlass with a view to it publication in electronic form. In February of 1998 the first volume of an electronic edition will be published by Oxford University Press. The Nachlass will be released in 4 volumes with the last appearing by the end of 1999.
Wittgenstein's Nachlass presents numerous problems for publication. Wittgenstein was in the habit of continuously revising and rearranging his manuscripts. By copying and pasting passages from one manuscript to another, and rewriting from earlier drafts, he composed a new but partially identical text, which he then set about revising in the same way. Thus the Nachlass is, in a certain sense, repetitious; it contains several layers of basically identical text. The result is a text in which old remarks sometimes reappear without modifications or are placed several times in different contexts.
Since Wittgenstein himself never prepared more than a negligible fraction of his writings for publication, most of his manuscripts and typescripts are still full of various annotations, deletions, insertions, marginal remarks, critical instructions and cross-references, and alternative formulations for particular phrases. Neither is it always clear which of such alternative formulations he finally decided on upon.
In preparing this material for electronic publication, the following aspects have been taken into consideration: - text encoding
proofing tools
text presentation
text retrieval
Text encoding
The WAB transcriptions are encoded in a primary format using a syntax called MECS. One reason for choosing a non-SGML encoding standard was that elaborate SGML applications were simply not available when the project began. Furthermore, the task dictated the need for. Another was that the task in hand required a system that did not impose a hierarchical document structure but that could handle overlapping text features. Unlike SGML, MECS does not require a Document Type Definition, although it allows (but does not require) the specification of a simpler Code Declaration Table (CDT), which is basically a declaration of the tags used in a particular document. But in no way does the MECS CDT impose a hierarchical structure.MECS is designed to ensure that MECS conforming transcriptions can be easily formatted for output to other applications. Several filter sets are in use at the Archives for the presentation of the transcriptions in WordPerfect, HTML, FolioFlatFile or in plain ASCII format. The Wittgenstein Archives emphasizes the need to prepare the transcriptions in a format which is neither system nor application dependent.
One of the weaknesses of the approach to text encoding used at the Wittgenstein Archives is certainly that the independence of the transcriptions has not been considered to be of equal importance as the final representation of the text. For example, in both the representation formats standardly used at the Archives - the diplomatic and the normalised versions - Wittgenstein's alternatives have been hard coded in the transcriptions. These means that, even a slight difference in the filtering profile - such as the inclusion of deleted text in the normalized version - could give misleading results.
An important guideline of manuscript transcription is to describe the original as exactly as possible. Transcription should document almost every aspect of the original in minute detail but avoid interpretation. Wittgenstein frequently marks parts of his texts with underlining. There are several different types of underlining, - such as straight, wavy, dotted and broken lines, underlinings with one, two or several lines. We know the different kinds of underlining have different meanings, e.g. that a straight line means emphasis and wavy lines in general indicate dissatisfaction with content or formulation, but we encode them as straight and wavy underlining, we do not indicate the underlinings as emphasis or dissatisfaction.
Proofing tools
The Wittgenstein Archives uses several tools for checking the accuracy and correctness of its transcriptions according to various criteria. Transcriptions should conform to the encoding standard as well as to certain accepted standards of German and English orthography.
A MECS parser allows one to check whether a document is well-formed (meaning that it conforms to the MECS specific syntax) and valid (meaning that it contains only elements defined in the Code Definition Table).
Encoded documents can be spell-checked while remaining in primary format. For this purpose a complete list of graphwords is deduced from the encoded transcription with line and column references to the transcription itself. This word list is checked against a master word list in the appropriate language. The master list for each language is built up by compiling the results of previous spell checks and is augmented with the acceptable graphwords from each new spellcheck.
Code extraction provides a quick and effective means of checking for consistency and correctness of the use of specific tags. This becomes particularly important for checking consistency of practice in a volumes of source material transcribed over a long period of time by different transcribers.
A specific filter profile allows us to derive all possible combinations of the alternatives in a given text segment - normally a sentence. In terms of this one can check whether each and every possible text is a well-formed sentence according to German (or English) grammar.
But there are still areas where we would like even more control, e.g. to prevent transcribers from putting a string or a comment inside a tag where only a numeric value is accepted, or from writing a wrong format in a date field etc.
Text presentation and layout
Although the transcriptions encoded at the Archives can theoretically be presented according to a variety of layouts, two specific formats have been established as part of the work process. These we call the diplomatic and normalized versions of a transcription.
The diplomatic version retains as much detail from the original as possible, including deletions, overwritings, substitutions, spelling mistakes and so on. The normalized version provides a 'reading' version of the transcription and omits details such as deleted and overwritten text. In the latter, only the last alternative of undecided substitutions is rendered, and spelling is corrected and normalised.
Both versions will be included in the Bergen Electronic Edition.
In addition to the transcriptions the WAB publication will include digital facsimile images of the entire Nachlass. The transcribed texts, both in the normalised and the diplomatic versions, will be linked with the images. For reasons of cost - and because staff at the holding libraries were more highly skilled in traditional photography than in the use of digital equipment - we decided to use standard colour photographs as a basis for making digital images. The photographs are digitized and stored on Photo-CDS at a scanning bureau. From the Photo-CDS JPEG-compressed versions of the images are produced.
Text retrieval
Unlike traditional editions, electronic editions have an inherent flexibility which allows hyper links for automatic cross referencing and efficient searching for single words or word combinations in a large number of manuscripts. This edition uses FolioViews for its text presentation and retrieval. FolioViews has been chosen because it offers a combination ofexcellent tools for text searching and hyperlinking. Search results can be displayed in a keyword-in-context list as well as in full text. For the Wittgenstein manuscripts specific search fields have been designed. They allow searches in certain volumes or manuscript groups, or within Wittgenstein's alternatives, within date ranges, specific languages, Geheimschrift, graphic material, and mathematical notation.

