Digitizing Difference: The Challenge of Heterogeneity in the Sources of Early Modern Science

Brian Fuchs; Charles Littleton; Dolores Iorizzo; Jochen Büttner

Authorship

1. Brian Fuchs

Max Planck Institute for the History of Science / Institution Max Planck Institut für Wissenschaftsgeschichte
2. Charles Littleton

University of London
3. Dolores Iorizzo

University of London
4. Jochen Büttner

Max Planck Institute for the History of Science / Institution Max Planck Institut für Wissenschaftsgeschichte

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

One of the most pressing tasks that confront historians of science today is the need to incorporate into the standard narratives of science a mass of heterogeneous material that, for reasons of complexity, obscurity, or perceived irrelevance, has been hitherto overlooked or dismissed. The task before editors, interpreters, and teachers is no light one. The material often consists of notes or papers from well-known scientists that were never published during their lifetimes and indeed in some cases never intended for the eyes of others. In many cases the personal or even secretive nature of the undertaking has meant that the material exists in a form that makes traditional transcription difficult or impossible: obscure scribbles, cryptic notes, classificatory schemes meant to be intelligible only to their authors. Even when transcription of such material proves possible, an editor is still likely to find himself constrained by the printed page: a single page may not be able to represent the complex interrelationships which often characterize the material or, worse yet, may require that one plump for a single interpretation of these complex relationships where hundreds are possible. Even when the problems posed by heterogeneity in a single source have been overcome, one must cope with the differences in data brought together from related material in several sources. Here the sources of early modern science, which range over several languages, even within the corpus of a single author or within a single work, can prove particularly daunting, posing often insurmountable obstacles not only for editors, but also for scholars and students seeking some kind of reasonable access to the often very rich funds of information which they conceal.

What brings the three projects that make up this session together is that each has been directly inspired by these challenges and in a sense owes its existence to them. In order to overcome what have hitherto seemed like impassible obstacles to publication and comprehension, each project has turned to the digital medium, and in particular to the rich possibilities for digitization and consultation offered by a consistent digital rendering of diverse and discontinous source material. In all three cases, the potential payoff of successful digitization is immense: a fairly easy and uncomplicated access to material that might either never have appeared in print or have appeared in a form which sacrificed heterogeneity to the exigencies of presentation.

The latter possibility is in fact the point of departure for the Galileo Project, which has successfully digitized Galileo Galilei's MS 72, an often indecipherable palimpsest of notes made by Galileo over a long period of time, which raises all of the usual questions of contiguity, priority, and consistency in their acutest form. Where printed versions have entailed either leaving disjointed pieces of text that clearly belong together or pasting together texts physically separate and distinct, the electronic edition has made possible a representation that links location in the manuscript to transcribed text while at the same time permitting any desired concatenation of pieces of text.

In the case of the Boyle Project, the heterogeneity of the data is so formidable that, even if successfully digitized, it may never result in a single interpretation. The Project aims to digitize Boyle's copious laboratory and reading notes, most of which are endorsed with titles and numbers written in the margins, in what appears to have been an attempt at cataloguing, as well as a series of indices which appear to refer to these endorsements. The challenge for interpreters lies in determining what cataloguing system or systems, if any, were operative. Here the digital medium offers two clear advantages over printed media: first, the possibility of representing the material in a form which makes no judgment, implicit or explicit, about the internal relations of the data, and secondly, the opportunity to sift the data, once digitized, in ways difficult or impossible with a sequential reading or indeed with any traditional printed edition. And the advantages of such a digital representation increase exponentially if, as is possible, Boyle's cataloguing strategies changed over time or were simultaneously instantiated in multiple, possibly competing versions.

In contrast to the other two projects, which are concerned with heterogeneity within a single source, the Archimedes Project confronts the heterogeneity of data within a corpus of texts which, though written in a variety of languages and separated by great distances in time, all discuss ostensibly the same relatively small set of problems-the central problems of classical mechanics. How can this very diverse collection of data be brought together in a useful and accessible way? To address this problem, the project has developed a series of specialized tools designed to allow the user not only to interpret linguistically disparate data but also to gather together data that would otherwise never cross linguistic and conceptual boundaries, and to feed that data back into the library as a kind of seed for further data-analysis. One result has been the emergence of new and unsuspected patterns in material that, in contrast to the material dealt with by the other three projects, has become if anything overly familiar.

The three Projects will present the results of their constant grappling with the complexity of this source material, each from a different angle: the Boyle Project's digitization from the starting point, the Archimedes Project from the midst of digitization and design, and the Galileo Project as a fully-completed and award-winning site. It is our hope that a presentation of both the level and the range of our experiences in dealing with this material-and of the complicated questions that it poses-will offer useful insights for, and elicit welcome advice from, the rest of the digital community and not merely those whose projects are concerned with early modern scientific manuscripts.

Sifting the Scholarly Waste-Basket: The Electronic Representation of Galileo's Private Notes on Motion (Jochen Büttner)

As a result of a joint project of the Biblioteca Nazionale Centrale, the Istituto e Museo di Storia della Scienza, both in Florence, and the Max Planck Institute for the History of Science in Berlin, an electronic representation of Galileo's notes on motion and mechanics has been made available through the Internet. These private notes that were taken over a time period of about 30 years are contained in a manuscript that consists of about 200 loose sheets of papers. This manuscript, preserved as part of Codex 72 in the collection of Galileo's manuscripts at the Biblioteca Nazionale, has, in spite of its importance, neither been translated in full, nor even been adequately published until today. The electronic representation of Galileo's notes on motion presents this historical document, together with the results of research on it, in a new way, it offers new tools for further research, and it is expected to become a backbone for a collaborative research effort on this important manuscript in the future.

It proved virtually impossible to publish the chaotic mixture of texts, calculations and drawings on the pages of Codex 72 using traditional editorial techniques. Even the masterful Edizione Nazionale (1890-1909) of Galileo's papers by Antonio Favaro, which is the canonical entry point to Galileo's work, proved to be insufficient in its representation of this codex. This is also true for a supplement to the Annali di Storia della Scienza (1979, Fasc. 2) by Stillman Drake which contains "Galileo's Notes on Motion Arranged in Probable Order of Composition and Presented in Reduced Facsimile" in which cut out fragments are stuck together in a piecemeal way according to their alleged order. In view of the steadily growing scholarly interest in Galileo's manuscripts, the project of a new and comprehensive edition of the Codex 72 hence had become a desideratum.

A solution to the problem of an edition of the Codex 72 was offered by the application of new media. Tools for realizing hypertexts have dissolved many of the obstacles of tackling a chaotic manuscript such as Galileo's notes on motion. It was thus possible to produce an electronic representation of the manuscript in which the folio pages of the original are represented in such a way that complex semantic networks can be traced and bold hypothesized connections can be elaborated by the simple click of a mouse. We consider this electronic representation of a manuscript to be a new form of edition which substantially differs from a traditional paper copy edition. It has its own editorial and technical principles. While there won't be time to go into these principles in detail, I nevertheless wish to illustrate the different character of the electronic representation of Codex 72 by discussing a few outstanding features.

The electronic medium offers far-reaching opportunities for handling data which are extensively exploited in our edition of Galileo's notes on motion. Thus, internal links between related pages of the chaotically arranged folios facilitate the comprehension of the internal consistency and development of Galileo's thought. Links to other sources, publications, and scholarly work, which are continually updated, provide entry points to the current research on individual pages of the manuscript. One of the most attractive modes of access that is entirely unparalleled in traditional editions is provided by automatically generated indices of words, numbers, and the lettering used in diagrams, as well as by a list of propositions. For us, the opportunities offered by electronic media have meant not only an ability to represent what was hitherto unrepresentable with traditional means, but also a complete rethinking of method of production and its implications for future research. First, it has allowed us to include material that is normally considered a by-product of research, rather than its goal. In fact, this is true of the whole project-our electronic representation was never conceived as a project in its own right; rather, it was a spin-off of our own research work on this particular manuscript. The content of the electronic representation consists largely of material produced as part of the research process-material that would normally go unpublished. Furthermore, most of the technical effort invested in the development of the electronic representation went into the development of a database containing this material as well as the results of our own research and analysis. This database was prepared mainly as a tool for our own work; making this tool available to others via the Internet proved to be comparatively easy. But not only do we make our research material and our working tools public; we even go so far as to make our own results available at an early stage, sometimes even before having published them in the traditional way. To us, the electronic representation of MS 72 represents an "instrument in progress," and one which would simply not be possible in traditional media.

The Galileo Project is an "instrument in progress" in another sense, too. In a traditional edition, a small number of editors work together towards a final goal-the printed edition. Once the edition is printed, the only way to improve the results is to start all over again and produce a new edition. An electronic edition does not suffer from this limitation: it is, by its very nature, open-ended and capable of constant revision and improvement. Thus, the Galileo Project is conceived not only as an edition in the traditional sense, but also as a starting point for a new stage of scholarly work on the manuscript. It is our hope that, in place of the small band of editors who work on traditional editions, the scope of collaborators will now broaden to include virtually all scholars interested in this manuscript. The results of their work can be integrated into updated versions and produced in 15 minutes without any additional costs. A comparison with the time and cost necessary for producing a new paper edition highlights the essential qualitative difference between an edition as an end in itself and an edition which is the potential starting point for further cooperative research. The open-ended instrumental nature of our project is, in fact, its most distinctive feature and the one we are proudest of.

Working with an 'Indigested Heap': The Electronic Edition of the Work-diaries of Robert Boyle (Littleton)

Robert Boyle (1627-91) was one of the leading scientists of the late seventeenth century. A founding member of the Royal Society, he was heavily involved in its programme of experimental discovery and demonstration, and published the results of his own innovative experiments copiously from 1660 to 1691-and thereafter many more of his experiments and observations were published posthumously. He is most celebrated for his experiments involving atmospheric pressure in his 'air pump', through which the principles of the Law bearing his name were derived, and for his espousal of the corpuscularian theory of matter. Boyle was a prolific writer, and his unpublished manuscripts, now kept at the Royal Society, are almost as extensive as his published works-but not nearly as well known. One group of these manuscripts is particularly important for our understanding of Boyle's intellectual and scientific development. From 1647 until his death, Boyle kept unbound sheets of paper on which he recorded experimental results, accounts of natural phenomena vouchsafed to him by travellers and virtuosi, and extracts from natural philosophical works he consulted. These were most often recorded by his many amanuenses, but frequently Boyle, or another amanuensis, would retrospectively write numbers, titles and/or endorsements in the margins of these entries, to be used as a cataloguing system for future data retrieval. Prof. Michael Hunter of Birkbeck College, who is currently editing the complete works of Robert Boyle, has dubbed these collections of notes 'work-diaries', as they appear to reflect Boyle's daily laboratory activity over a certain period of time. The work-diaries, the raw material for Boyle's scientific ideas and our most immediate evidence of his laboratory practice and techniques of data gathering, are the subject matter of a major electronic text edition now in preparation.(1)

My paper discusses the work of the Robert Boyle Project at Birkbeck College, London, in preparing the Boyle work-diaries for publication on the Web in late 2001. This fragmentary manuscript material is being encoded in the Extensible Markup Language using the Document Type Definition and markup guidelines developed by the Text Encoding Initiative. With generous funding from the Wellcome Trust, this project is now entering its final months. Approximately 500,000 words of manuscript text, written in a range of hands and often employing alchemical terminology and symbols, have been transcribed and encoded. Annotations and tags to facilitate searches are now being incorporated into the text, and an XSL stylesheet will be developed in order to deliver these transcriptions and their editorial apparatus over the Web. Readers of the webpages will be able to choose between accessing a 'clean' version of the text, in which interlineal insertions are silently included and deletions and corrections found in the original are hidden, and a version in which these deletions, insertions and corrections are visible and marked as such (using different colours, font, format, etc.). By employing the <index> and <rs> tags from the TEI DTD, we will make the work-diaries fully searchable by subject and word, and will eliminate the problems caused by the several forms and languages a concept such as 'mercury' or 'iron' can take in these early modern scientific writings. The scholarly results that can be attained through the use of digitization are great, although I concentrate here on the results of only one of our projects, in which we have sought to determine Boyle's system of cataloguing and using his notes. Near the end of his life, Boyle tried to organize his remaining notes in one last compendium of natural philosophical observations, which he tentatively entitled 'Paralipomena', or literally 'things left out', that is, observations omitted from his major body of work which he intended to produce as a supplement. The Paralipomena, however, was never published, and we are left with the 'indigested heap', as Boyle himself termed it,(2) of papers and work-diaries left behind and now collected quite randomly among the volumes of the Boyle Papers in the Royal Society. Much of the work-diary project has been involved with collating the various pages of the separate work-diaries from their various locations among the volumes of the Boyle Papers and organizing them in a coherent and chronological order. The next step, upon which we are presently embarked, is to discover the relations between the individual work-diaries themselves and between them and Boyle's published work, and those ambitious works, like the Paraliopmena, which were never published. Enticing clues for these questions are provided by what appear to be keys and indices to Boyle's notes still among the Boyle Papers. These are mainly lists of numbers with brief titles of observations and notes written next to them. It is clear that in some way the many marginal additions to the work-diary entries, often made retrospectively, in pencil and by somebody other than the original author, constitute a type of coding system related to these lists. Through the electronic markup of the text, we intend to determine the connections between these otherwise impenetrable guides and the many surviving work-diaries. Every marginal note is tagged separately from the main entry of the text, and important details such as the hand and medium (pen, ink, pencil) in which it is noted as an attribute value. Marginal notes with similar content or attributes can then be grouped together and compared to the indications in the indices. By isolating and grouping together other components of the work-diary entries-marginal numbers, dates, endorsements and even stray marks-we are able to develop several further insights into how Boyle conceptualized and organized his working notes.

This exercise enables us to understand better two important features of early modern science. It reveals significant information about Boyle's own organization of and approach to data. Here we can see Boyle's often abortive attempts to come to grips with his mass of apparently random material and the hesitant ways in which he tried to catalogue it according to his own priorities-showing us a Boyle far removed from the Olympian 'Father of Chemistry' so often presented in older accounts of the Scientific Revolution. The electronic edition can clearly reveal the patterns, continuities and disruptions in his plans to sort through his 'chaos', as he himself often called it.(3) More generally, this project enables us to see an example of the methods of document management in the seventeenth century. Indeed both these seventeenth-century texts and the modern electronic edition are concerned ultimately with the same goal-the efficient classifying, sorting, searching, retrieving and using of data-and this project will help us to compare the different methods and possibilities available when working with such dissimilar resources.

references

(1) Michael Hunter and Charles Littleton, "The Work-diaries of Robert Boyle: a newly discovered source and its Internet publication," Notes and Records of the Royal Society (forthcoming, 2001).

(2) Royal Society, Boyle Papers, vol. 25, 217.

(3) Michael Hunter, "Mapping the Mind of Robert Boyle: the Evidence of the Boyle Papers," in: Michael Hunter (ed.), Archives of the Scientific Revolution: the Formation and Exchange of Ideas in Seventeenth-Century Europe (London, 1998), 121-36, esp. 133-6.

The Archimedes Project: Integrating Heterogeneity in the Sources of Classical Mechanics (Fuchs)

The Archimedes Project is a collaborative project at the Max Planck Institute for the History of Science in Berlin and the Perseus Project at Tufts University, which aims to create a web-based working environment for studying the history of classical mechanics from its origins in the ancient world to the time of Galileo. The core of the project's work consists in the digitization of ancient and early modern texts on mechanical theory and practice, with the eventual aim of creating a digital research library. To date, the Project has digitized some 50 megabytes of sources texts in Greek, Arabic, Latin, Italian and English. In addition to the source texts, the site includes Greek, Latin, and Italian lexica, as well as morphological analyzers in these languages which allow the user to move easily between source text and dictionary entry. Over the next year we plan to add another 100 megabytes of text, and several new tools, including a lexicon and morphological analyser for classical Arabic. The Project is web-based but at the moment it is fully accessible only from within the Institute. We hope to make the library publicly available some time next year.

My paper will discuss the challenges the Project has encountered in digitizing the heterogeneous sources of classical mechanics and in making the information found in them accessible in a useful and meaningful way. In particular, I will concentrate on the problems we have encountered in building a working environment that can cope with such a wide variety of sources and the solutions we have found for them.

Heterogeneity of sources is a problem which continually confronts us at the Archimedes Project. The two other projects participating in this panel face primarily a problem of heterogeneity within a single source text or a corpus belonging to a single author. The kind of heterogeneity that the Archimedes Project must cope with is one that occurs not only across corpora but also across time and across cultures. In the most general terms, the problem is one of connection: how do we build links between pieces of data which, though similar in theme, are couched in widely differing cultural contexts, without at the same time sacrificing the distinctiveness of the data? Cooperation with the Perseus Project has enabled us to cope with linguistic difference. Each of the words in our texts is connected through a morphological analyzer to a dictionary entry, where the user gets not only an English definition, but also, in many cases, links to other instances of the word in texts available on the site. This enables one to check the appropriateness not only of the translation in the particular text being studied, but also of the definition in the dictionary itself. In addition, a synonym tool developed at the Perseus Project yields synonyms for Latin and Greek as well as cross-language equivalents-a particularly useful tool when dealing with technical terms, where synonyms and equivalents in other languages can yield important clues for connecting data both within a single text and across texts separated by linguistic barriers. But the most daunting problem the Archimedes Project faces in terms of heterogeneity is not primarily a linguistic one, though it is in fact nurtured by linguistic differences. Our source texts focus on a fairly narrow range of themes and conceptual models and it has always been the aim of the project to make its material accessible in terms of theme, so that changes in theme and thinking may be measured both against and apart from the particular form in which they happen to be expressed. In order to achieve this goal, however, particular care must be taken to keep digitization and commentary separate-more so thanis usually the case in digital projects, since a judgment about the structure of a text at an early stage may make the theme to which that structure provides a clue inaccessible at a later stage. This is particularly important with mathematical and geometrical texts, whose often complex structure may hide important clues as to provenance, influence, and innovation. To achieve this separation, we make use of a "production line," which allows for a basic structure to be tagged in a source text at an early stage by people who are normally not experts in the history of science, while postponing to a later stage more complicated judgments about structure. When the later stage is reached, detailed annotation about the structure is added to the website in the form of pieces of commentary linked to locations in source texts-a point, a text region, or an SGML node. The commentary itself ranges over everything from the SGML structure of the text itself, revisions to the div structure, for instance, to the identification of technical terms, argumentative structures, and conceptual models. Nor need it be from a single commentator. It is fully envisaged that the commentary texts will normally be from groups of scholars working on the same text and will reflect a range of opinion. The data from the commentary text is then used as the basis for visualizations which link together similar themes in otherwise linguistically and culturally diverse texts. In scholarly terms, the initial results have been promising: a study of the first text of classical mechanics, the Mechanical Problems of ps-Aristotle, conducted using these tools, has revealed an unseen consistency in the argumentation and thinking of the treatise, and has if not strengthened the argument in favor of Aristotelian authorship, at least taken the wind out of most arguments against Aristotelian authorship, for which exhibit A has always been the apparent chaos of the treatise.

The results are pleasing to us not least because we feel that it is precisely on this score we differ from most other digital projects. Whereas the goal of a typical digital project is to make a given set of texts available in a final digital form, the Archimedes Project conceives of itself as an essentially open-ended project, in which scholarly commentary is continually added to the website as metadata to enrich the "connectivity" between texts without sacrificing the differences that separate them.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2001

Hosted at New York University

New York, NY, United States

July 13, 2001 - July 16, 2001

94 works by 167 authors indexed

Affiliations need to be double-checked.

Conference website: https://web.archive.org/web/20011127030143/http://www.nyu.edu/its/humanities/ach_allc2001/

Attendance: 289 (https://web.archive.org/web/20011125075857/http://www.nyu.edu/its/humanities/ach_allc2001/participants.html)

Series: ACH/ICCH (21), ALLC/EADH (28), ACH/ALLC (13)

Organizers: ACH, ALLC

Digitizing Difference: The Challenge of Heterogeneity in the Sources of Early Modern Science

1. Brian Fuchs

2. Charles Littleton

3. Dolores Iorizzo

4. Jochen Büttner

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2001