The Orlando Project, based at the Universities of Alberta and Guelph, is writing the first full scholarly history of women's writing in the British Isles. Supported by a five-year Major Collaborative Research Initiatives grant from the Social Sciences and Humanities Research Council of Canada, the Orlando team currently comprises two principal investigators, four coinvestigators, three postdoctoral fellows, a computing librarian and eight graduate research assistants. From the outset in 1995 we planned our history to appear in both print and electronic form and to fully integrate literary history and humanities computing methodologies. One aspect of our project which we believe to be unique is the way in which the material for the history is being encoded and stored in electronic form. All our work is SGML-based, but, rather than encoding existing texts as most other electronic projects in the humanities are doing, we are writing new material and at the same time incorporating sophisticated markup not only for structure but also for critical analysis and interpretation. The challenge for us now is to use this mass of information with its detailed encoding to create intellectually coherent and polished scholarly products.
We will produce a chronology, and four volumes of literary history covering the periods: beginnings to 1830, 1820 to 1890, 1880 to 1945, and 1945 to the present. The chronology was originally conceived of as a paper chronology of women's literary history. In keeping with practice for chronologies in the field of literary history, it was planned to organize the dates significant to women's writing into several columns, in this case four: women's writing; the writing climate including not only men's writing, but matters affecting literature, like publishing or censorship practices, and all writing not by our central authors; the social climate, including events associated with various fields of knowledge and changes in everyday life; and national and international events. When the Orlando Project embraced computing technology, we began to expand our sense of what we could do with the chronology using computing technology. It was obvious that the capaciousness of electronic storage could allow us to include every event we judged to have chronological significance, no matter how slight, but such a power of inclusiveness makes the need for intelligent selection of events if anything more crucial, even as it lets us select on a scale unfamiliar to workers with paper. We also wanted the capability of producing subsets or different views of the chronology, as well as links to other material. Different events would be privileged in each view, but we would also need to ensure that each view appeared to the user as a coherent and polished whole.
In this paper, we provide an account of the design of the chronology and the challenges we have faced in developing it, but in order to describe this work, we must first situate the chronology within the larger project goals and work patterns, and in particular show how the SGML chronology elements fit into the bigger picture. The research material being compiled by the Orlando project covers very many aspects of women's writing and literary history. However, it became clear early on that creating a kind of mega-DTD to cover all of this was not practical in our situation. We had graduate students ready to start work almost as soon as the project started, at the time when we still faced many sessions conducting a detailed project requirements analysis. We therefore initially divided the material into three different categories or document types, for each of which we created a single DTD: women's lives (the biography DTD), their writing (the writing DTD), and more general events that may have influenced their writing (the events DTD). We are writing the document instances of each of these DTDs in such a way that they can exist as standalone documents. A typical biography document is some 1500-2500 words of prose, with major subdivisions for key aspects of the writer's life such as birth, family, political activities etc. The biography and writing documents are structured to include embedded chronological events that may be extracted into the project's larger chronology. Events documents are simply that: a collection of events which are intended for the chronology.
Potential items for the chronology are encoded in a <chronStruct> element which includes a date, <chronProse> which is a sentence describing the event, and <shortProse> which includes further explanation of the event or additional information. The <chronProse> may contain other elements, for example, names of organizations, political affiliation, and place names. Here is an example of a <chronStruct> within the division of Jane Austen's biography document that describes her birth.
<DIV2><CHRONSTRUCT><DATE CERTAINTY="C" CALENDAR="NEWSTYLE">
16 December 1775: </DATE><CHRONPROSE>JA was born at <PLACE><SETTLEMENT>
Steventon</SETTLEMENT> in <REGION>Hampshire</REGION></PLACE>, a month after
her mother had calculated that the birth was due.</CHRONPROSE></CHRONSTRUCT>
<SHORTPROSE><P>Nearly eleven years younger than her eldest sibling, she had
five brothers and one sister as her elders, and one more brother nearly
four years her junior.</P>
Events documents are essentially collections of <chronStruct>s denoting events that do not fall within discussion of writers' lives and writing, but have rather to do with larger social, political, cultural, and other historical factors.
These structures of course give us the ability to merge the dates and chronology material from a variety of documents. With a Perl script we can pull out all the <chronStruct>s, sort them by date and create a draft chronology for the literary investigators to review. In reality, however, this is not as simple as it may sound. Dates are one major problem area. We have had to structure and tag our dates to facilitate their inclusion in the larger chronology and to find ways of dealing with events where the date is not known or unclear. For example, events such as book publications for which we know only a year and no month or day are initially clustered together at the beginning of each year. Listed in this way, they seem unpolished to the user, and much less useful than the more exactly dated publication events which are placed with greater precision in relation to the surrounding events. Moreover, in some cases, year-only events appear before events which are known from other evidence to precede them. Also, by this system, year-only or month-only events necessarily sort before more detailed events even when those clearly precede the less detailed. (Sorting year-only dates to the end of the year, or randomly distributing them, produce analogous and even more irritating problems.) If we know that someone wrote a poem to celebrate Victoria's coronation some time after the coronation, but do not know the exact date, it is necessary to put an "after" date to ensure that this event does not precede the coronation itself when the events are combined. Or a researcher into religion may find that the first Anglican nursing order, the Sisters of St John the Divine, was founded on a particular day. If a second researcher investigating the history of nursing has found that a house was established for the Sisters in a hospital that year, and writes this event up to sort by the year alone, it will appear in the chronology before the event noting the founding of the order.
We also want to be able to generate selective chronologies on specific topics and time periods, and have so far been experimenting with subset chronologies for women and education, women and politics, and women and publication. So far, many of the events for these chronologies have been selected manually by the literary investigators, but we have also been investigating the use of keywords to sort chronology items for such purposes, and how to select the vocabulary for these keywords. A pre-determined thesaurus or set of subject headings presents problems for a project of this nature. To a certain extent the significant keyword categories have determined themselves as the research developed, but we also still need to perfect a degree of uniformity so that the subject chronologies will be meaningful. In many ways our proposed keywords are more akin to a back of the book index, but we have still to devise an effective way of systematizing our keywords and the way we use them, and to clarify in detail the relationship between keywords and our content tagging. During the summer of 1998, a library school student at the University of Alberta will join the Orlando team to work on these particular issues.
Most of our effort in the past year has been directed towards completing and refining the chronology. To date we have over 1000 biography documents in an advanced state of completion, plus many thousands of chronology items in the events DTD, a large collection of bibliographical entries indicating publication dates of important works, and some sample writing documents. Periodically we generate the "allchron" gigantic chronology for our literary researchers to work on selecting events and refining entries - at the last count, we had over 24,000 chronology items in total. A <relevance> tag is being added to every entry to indicate what level of chronology (overall, decade etc) the entry is appropriate for. In a good many cases we have been able to add this tag automatically (for example birth and death dates since we have only included these where we think the stature of the writer or other historical figure merits it), but many other entries must be edited manually. In all cases the revisions are being inserted in the documents from which the chronology items were derived, and in such a way that they maintain coherence in the source document as well as the chronology. We have been polishing sample periods from the 18th, 19th and 20th centuries and learning from this process how best we can facilitate the revision of the remainder of the material.
Earlier in 1997, the Orlando Project was accepted into the Higher Education Grant Program of Electronic Book Technologies (EBT), now Inso Corporation. The Inso products, particularly Dynatext, can deliver the main chronology and various subset chronologies in a meaningful way with this software, provided that any sorting is carried out by a preprocessor. Dynatext allows us, for example, to highlight chronology items within the biography documents as well as to present the "allchron" and smaller subsets of it, both on screen and in a print format suitable for the literary investigators to work on. It also allows us to search the chronology for any of the content tagging within the items. In the next stage of the project, we want to see how far Dynatext will allow us to produce the kind of dynamic chronology we originally set out to create, and what other software tools we may need to complement or replace it. Likewise, we need to implement the hypertext linkages that we have always planned for the chronology.
For the future, our challenge is to work out in more detail what would be the best views to privilege for the chronology and tailor them to meet the needs of a multiplicity of scholarly applications in women's literary history. The chronology is also serving as a model or pilot for delivering other subsets of the Orlando Project textbase. The chronology was planned from the outset and we were thus able to include tagging to help us create it right from the beginning of the project. Even so, we are encountering many problems as we constantly strive to refine the draft chronology. Our detailed content tagging presents many other possibilities. We can present many other subsets of the material and indeed the keywords discussions are taking place with reference to the entire collection of material (that is, also to passages of information or analysis which are not tied to a specific date). Our work with the chronology is thus not only providing a macroscopic (and selectively microscopic) view of all our research material, but is also helping us work out our long term plans for delivery of many other aspects of our integrated history.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)
July 5, 1998 - July 10, 1998
109 works by 129 authors indexed
Conference website: https://web.archive.org/web/19991022041140/http://lingua.arts.klte.hu/allcach98/