Humanities Computing Unit - Oxford University
Humanities Computing Unit - Oxford University
English Department - University of Kentucky
Identifying, preserving, and using high quality digital
resources
Michael
Fraser
Humanities Computing Unit, Computing Services Oxford University
michael.fraser@oucs.ox.ac.uk
Michael
Popham
Humanities Computing Unit, Computing Services Oxford University
michael.popham@oucs.ox.ac.uk
Elizabeth
Solopova
Department of English University of
Kentucky
esolop@pop.uky.edu
1999
University of Virginia
Charlottesville, VA
ACH/ALLC 1999
editor
encoder
Sara
A.
Schmidt
Selecting Resources for a Subject Gateway: Who Decides?
Michael Fraser
The Higher Education Funding bodies in the UK recently called for
bids to develop subject-based faculty 'hubs' or gateways which
locate, catalogue, and give access to digital resources suitable for
use in Higher Education teaching and research as part of the new
Resource Discovery Network. The new faculty hubs will further
develop the design and purpose of the existing centrally-funded
gateways, amongst which ADAM (Art, Design and Media), IHR-INFO
(History), and SOSIG (Social Sciences) currently have some remit for
humanities disciplines.
The Humanities Computing Unit has been invited to submit a bid to
develop the proposed Humanities Hub of the new Resource Discovery
Network. The proposal draws upon existing work relating to subject
gateways within Oxford, in particular the HumBul Gateway for the
Humanities and, on a smaller scale, the Computer-Assisted Theology
gateway, as well as other gateways within the UK and beyond.
This paper will focus on a particular issue which lies at the core of
any subject-based gateway, the criteria by which resources are
selected for inclusion within the gateway. Subject gateways
explicitly state or at the very least imply a concern that the
resources catalogued are quality-assured, an assurance based on
human intervention. But what does quality mean in this context?
Against what criteria and with what authority can an individual
resource be deemed fit for inclusion and therefore deemed fit for
purpose?
Gateways tend to fall into two basic types. To a large extent the
HumBul Gateway and the Computer-Assisted Theology gateway
demonstrate both types. The Theology gateway was developed by an
individual enthusiast with a keen interest in the possibilities
offered by the Internet for teaching and research and with specific
subject expertise. Gateways of this type are numerous on the
Internet and indeed many of the existing gateways to humanities
subjects fall into this category. For the purposes of this
discussion such gateways may be termed amateur gateways since their
development is often dependant on one or two individuals, often
without formal institutional support and frequently presented with
little information about selection criteria, intended audience,
available metadata, consistent classification, advanced searching
and so on. What these gateways can offer however, is a subject
practitioner's view of the Internet with evaluative as well as
descriptive annotation for each linked resource; they derive their
authority from the recognised expertise of the subject specialist.
The second type of gateway, one which to a large extent the new
HumBul strives to be, and which may be termed the professional
gateway, are fewer in number (certainly for the humanities). The
professional gateway is identified by institutional and sometimes
national support, developed by a specific project team, and
constructed along the lines of an advanced library catalogue (often
drawing its cataloguers from amongst subject librarians). The
professional gateway offers durability, structured and easily
retrievable data. On the other hand there is a tendency to hide from
the end-user the evaluative judgments made about individual
resources held within the virtual collection despite publication of
the criteria by which resources are selected for inclusion. Both
types of gateway contend with the inherent tension between trying to
be a digital library (cataloguing and dissemination) and something
like an academic reviews journal (discovery and evaluation).
The EU-funded DESIRE Project, "Selection Criteria for Quality
Controlled Information Gateways", developed and published a list of
quality selection criteria designed as a reference point for
subject-gateways (see <>). The criteria presented were arranged under five headings
which may be summarized as relating to: audience, content, design,
maintenance or durability, and comparison with related resources.
Under each heading a number of sub-categories contain a series of
questions to be considered by resource contributors to subject
gateways. The categories are comprehensive and the questions
detailed. The application of this criteria is intended to highlight
quality and limit quantity. The ADAM and SOSIG Gateways, for
example, either explicitly draw attention to this particular set of
criteria or have developed a similar set for their own resource
contributors.
Leaving aside the issue of whether such a comprehensive approach to
selection criteria actually serves the purpose for which it was
designed, it is significant to note that the wide range of questions
which are required to be answered satisfactorily before a resource
can be admitted into the catalogue bear little or no relation to the
metadata available to the end-user. For the most part the user of
gateways such as the two mentioned above are presented with fairly
sparse metadata consisting of title, subject, description and so on.
Whilst the cataloguer is forced to make evaluative judgements about
resources, the user has little notion of what these judgements might
have been. The mere fact that a resource has been included within
the gateway is, it seems, assumed to be enough. Descriptions are
short and objective, and rarely is an indication given even to the
contributor's identity or authority for making such judgments.
A fundamental question which underlies this paper is whether there is
a need for detailed quality assurance at all when the effort might
be better expended on more comprehensive, factual, metadata to
assist the searching and delivery of a gateway's holdings. The
combination of professional cataloging and amateur evaluation
appears to be successfully provided by services like the Internet
Movie Database and to some extent by commercial ventures like
Amazon.Com. Both these databases, however, concern themselves with
offline media available to their users only with some additional
effort. The Internet subject gateway, of course, catalogues
resources sharing the same digital medium as itself. It is intrinsic
to an Internet gateway to not only point away from itself but to
actually take the user to those resources using the same mode of
delivery. One might argue that providing reviews of Internet
resources is a superfluous activity given that the function of a
gateway is to take the user to the objects which they might inspect
for themselves, a task which neither the Internet Movie Database nor
Amazon.Com can undertake.
On the other hand, as this paper will argue, given that academic
subject gateways have an additional role of providing access to
digital resources suitable for teaching and research the Internet
offers something which the offline media cannot: a full integration
of the resource catalogue, resource evaluation, and the resources
themselves. It is only the combination of all four fundamental
elements, discovery, evaluation, cataloguing and dissemination,
which moves us towards a gateway which is subject-based, academic,
and Internet integrated, a combination which lies at the core of the
proposed Humanities Hub.
Accept/Reject? Quality decisions facing the Oxford Text Archive
Michael Popham
The Oxford Text Archive is one of the world's best known electronic
text centres, and has been in existence for almost a quarter of a
century. At the time of its establishment in 1976, there were
relatively few humanities scholars interested in the creation and
use of electronic textual resources, which meant that it was all the
more important to ensure that their efforts were preserved and made
available to future generations. However, despite the small size of
the humanities computing community, the resource implications of
undertaking any work involving electronic text meant that such
endeavours were rarely entered into lightly, or without significant
scholarly and technical input.
In the summer of 1996, the OTA was appointed as the electronic text
Service Provider for the UK's national Arts and Humanities Data
Service. In many respects this appointment was extremely timely, as
by now the international community of humanities computing scholars
had grown significantly, and many individuals who were less
computer-literate than their predecessors were starting to take
advantage of the facilities offered by cheap scanning technologies
and the emergence of the world wide web as a ubiquitous technology.
Individual academics saw less of a need to rely upon the archival
and distribution services offered by bodies such as the OTA, as they
now believed that they could undertake these tasks for themselves.
Yet this rapid growth in self-publishing on the web has raised a
number of concerns -- not simply about the quality of the materials
being created, but also about the methods and standards that have
been used.
Within the UK, the Arts and Humanities Research Board (AHRB) has
recently been established following agreement by the British
Academy, the Department of Education for Northern Ireland (DENI),
and the Higher Education Funding Council for England (HEFCE). They
have agreed to set up the Board pending a decision by the Government
on whether to establish an Arts and Humanities Research Council.
Funding for the AHRB will total over £36 million in the financial
year 1998-99, and £44 million in 1999-2000, with contributions from
all three parties to the agreement. Under section 10 of the
application form for research grants in excess of £5000, applicants
are now told that in the case of "projects whose primary purpose, or
significant product, is the creation of an electronic resource, it
will be a condition of award that data created as a result of the
research, together with documentation, should be offered for deposit
at the Arts and Humanities Data Service, within a reasonable time
after the completion of the project. Applicants involved [in]
research leading to the creation of such a resource are strongly
advised to obtain advice from the AHDS concerning appropriate
standards and methods". In practice this means that the AHDS Service
Providers, such as the OTA, have been receiving a glut of enquiries
from academics who will be affected by this new condition of award.
Many of those who have contacted the OTA have been somewhat
surprised to learn that we are less than enthusiastic about
endorsing their plans to create their materials solely in HTML, and
distribute these via a local website -- and very few have shown any
awareness of the relevant standards for resource creation,
preservation, and metadata.
We now find ourselves in something of a dilemma. The OTA is obviously
keen to ensure the long-term preservation and availability of the
scholarly outputs of AHRB-funded research. Yet at the same time,
many of the electronic resources that seem likely to be produced by
AHRB funding are not going to be created in accordance with crucial
standards and best practices. So, whilst the scholarly content of
these resources will almost certainly be of the highest order, they
may turn out to be poor quality resources from the point of view of
long-term preservation and viability. In order to address this
problem, the OTA (and the four other AHDS Service Providers) will be
producing a series of Guides to Good Practice, which will provide
the necessary guidance to the creators of electronic scholarly
resources. However, at the time of writing, it seems unlikely that
the AHRB will compel resource creators to follow the advice of the
AHDS Service Providers, which will surely result in the creation of
many technologically weak and poor quality resources, not to mention
the squandering of available funding. Moreover, if these resources
are to be preserved and remain viable in the long-term, they are
likely to prove difficult and costly for the OTA to maintain, and
present future end-users with additional problems (and therefore
costs) to overcome.
Elsewhere within the academic community, we have seen the emergence
of other recommendations, such as the MLA's Guidelines for
Electronic Scholarly Editions. Although this document relates to the
production of one very specific kind of electronic textual resource,
it is gratifying to note that it draws heavily upon the
recommendations set out in the Text Encoding Initiative's Guidelines
for Electronic Text Encoding and Interchange (TEI-P3), and is
therefore in keeping with the recommendations made by the OTA to
resource creators. Even so, despite the fact that the MLA, TEI, and
OTA are in accord with regard to what constitutes good practice when
creating electronic resources, it seems likely that it will be some
time yet before the majority of academics (and especially those with
minimal computing expertise), adopt such practices as a matter of
course.
This paper will briefly set out the OTA's perception of electronic
resource creation within the UK, and examine the reasons why many
academics seem unwilling or unable to adopt the recommendations and
good practices that originate from several of the key players in the
scholarly electronic text community. It will then look particularly
at the challenges confronting the OTA when identifying and accepting
electronic textual resources for accessioning into the OTA's
holdings. Having discussed the difficulties of weighing scholarly
merit against the long-term preservation costs, viability, and
usability of resources, the paper will conclude with an explication
of the OTA's policy concerning this contentious area, and set out
our criteria for resource selection.
Fit for Purpose: Issues Surrounding the Use of Digital Resources in
Research and Teaching
Elizabeth Solopova
Humanities disciplines have a large and varied body of digital
resources on which to draw for teaching and research purposes which
include scholarly editions, on-line dictionaries and journals,
collections of digital texts and images, large Internet gateways and
numerous individual and course Web pages. In spite of the recent
quick growth of digital resources, there are no guidelines
published, as far as I am aware, which assist the academic in
assessing the quality of a digital resource for actual use in
teaching or research. This is not surprising. First, the answer to
whether a resource is fit for a purpose will almost always be, 'it
depends'. Not only does the answer depend on the general purpose
envisaged, whether for teaching or research, it will also inevitably
depend on the precise needs of the individual asking the question,
for there is a whole spectrum of approaches to the subject even
within a single discipline. Secondly, it is not surprising that no
set of criteria exists for determining the quality of a digital
resource when so little published criteria exist for assessing the
quality of academic research in general. It is a contentious issue,
as the United Kingdom academic community who have been subject to
the Research Assessment Exercise will confirm. But the issue also
comes to the fore within the evaluation process for tenure, the
acceptance of publications by publishers and editorial boards, and
the success or otherwise of research funding. Underpinning all of
these is some notion of peer-review and the refereeing process which
remains crucial in the assessment of research publications.
How appropriate is the application of the peer-review methods to the
assessment of digital resources? In summary one might argue that
digital resources should not be treated any differently from other
resources. The peer-review process is as appropriate for determining
their 'usefulness', as it is for effecting their development,
publication and, hopefully, the academic rewards structure. The
assessment of digital resources, however, whilst always requiring
expert knowledge of the subject area, also requires an understanding
of the underlying technology. Acknowledged experts in manuscript
studies, for example, simply may not appreciate the potential
scholarly contribution of an electronic facsimile, if the digital
medium itself is significantly more alien to them, than the
publication of another printed facsimile. The recognition that
subject experts must understand the potential of the technology
employed, in order to assess the quality of a resource, was apparent
in the recent study undertaken by the Arts & Humanities Data
Service into the requirements of academics for the scholarly use of
digital resources (see ). The Oxford
Text Archive reported not only that their academic users perceived
as obstacles to the use of digital resources the technical ability
required to use certain resources and the corresponding lack of
training available, but also the current proliferation of resources
which by-pass the benefits of academic review.
Academic practitioners who do have a familiarity with current and
emerging technologies, however, have come to expect more from a
digital resource than can be delivered on paper. The electronic
edition of a medieval text is no longer a novelty. For both research
and teaching purposes there is almost an accepted expectation that a
critical edition in digital form will comprise not only the full
texts, but also the high quality digital facsimiles of all the
surviving witnesses. Moreover the witnesses are expected to be
encoded for advanced searching and linked to supplementary materials
such as glossaries and textual notes. In such cases it is less a
single resource than an entire scholarly environment that projects
are expected to provide. The editions are expected to be easy to use
and transparent even for a student inexperienced in both their
academic and technical aspects. As is well known digital resources
which strive to meet such expectations are often expensive, very
time consuming undertakings, requiring unremitting devotion and
extremely hard work from the teams which create them. These
difficulties are acknowledged by some members of the academic
community in that they make a positive evaluation of digital
resources as suitable for academic use, in spite of, for example,
the lack of high quality digital photography (accepting that it is
expensive and that permissions are difficult to obtain), or
(accepting the need to work in the situation of ever changing and
developing technology) in spite of occasional technological
failures, the lack of compatibility with all the existing platforms,
or their slowness even on the most current computers. Other members
of academic community are however less forgiving of these conditions
'outside the editor's control', and lose confidence in digital
resources.
Another difficulty well-known to any 'insider' is that the huge
quantities of data which underlie electronic resources often exist
in a form which makes them especially difficult to proofread. In
these situations the proofreading and checking done by the project
team under the increasing pressure of deadlines never seems
sufficient and may have to stop before complete satisfaction is
achieved. Again the arguments for making the results of work
available in spite of certain imperfections, and the dangers which
may result from this are not easy to balance. One particular danger
is that as a result of a combination of a large body of complex data
in a digital resource with a lack of technical expertise on the
users' part, a resource might be used or recommended for scholarly
use for some time before its faults become apparent. Any 'forgiving'
attitude on the part of the academic community, which itself may
benefit from the early publication of a cutting-edge resource still
imperfect in some aspects, should require an honest assessment of
the resource by its creators and an open statement of its
weaknesses. The promise of easy updating which comes with digital
technology justly encourages a 'forgiving' attitude, but should not
be used as a justification for the publication of poor quality work.
The input of scholars with different backgrounds is required for the
evaluation of digital resources within collections and gateways.
Ultimately, the inclusion of a resource within a peer-reviewed
gateway or a 'published' digital collection should have the same
effect upon its use and the reward of its creators as is associated
with current publishing activities.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at University of Virginia
Charlottesville, Virginia, United States
June 9, 1999 - June 13, 1999
102 works by 157 authors indexed
Conference website: http://www2.iath.virginia.edu/ach-allc.99/schedule.html