Supporting the Creation of Scholarly Bibliographies by Communities through Social Collaboration

paper
Authorship
  1. 1. Hamed M. Alhoori

    Texas A&M University

  2. 2. Omar Alvarez

    Texas A&M University

  3. 3. Miguel Muñiz

    Texas A&M University

  4. 4. Richard Furuta

    Texas A&M University

  5. 5. Eduardo Urbina

    Texas A&M University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Many digital humanities projects maintain online
bibliography digital libraries (BDLs) that support
diverse users in locating a variety of references. The Cervantes
Project (CP) bibliography aims to represent the
best resources about Cervantes published since 1605 and
is drawn from many multilingual sources. The current
CP bibliography gathering and filtering process is carried
out by sets of contributors: the expert editors, the
reviewers, and the authorized international collaborators.
Delays, possibly months, can result from the filtering
process and also from the process of uploading the
new publications into the BDL, which is separate from
the gathering and filtering process. In addition, the ability
to find new entries online is limited. Current bibliographic
search engines show a limited scope of coverage
on literature. There is no single resource that handles the
entire 2.5 million articles that emerge yearly from the
25,000 peer-reviewed journals (Harnad, S. et al., 2008),
so these engines access only a fraction of the literature
(Hull, D. et al., 2008).
We compared various humanities BDL’s main supported
features. Table 1 summarizes the main outcomes. Note that the majority of these BDLs do not take advantage of
the social collaboration of Web 2.0. This paper’s premise is that social collaboration with
the right level of moderation can support and reduce the
costs of creating a scholarly bibliography by benefiting
from the “wisdom of the crowds” (Surowiecki, J., 2004),
while ensuring the accuracy of the bibliography. This
could lead researchers to needed and interesting resources
in better time. We have experimented with this issue
by implementing a set of functionalities built on the drupal
CMS. We have tested them on a group of CP users
from different countries who use a variety of languages
to gather, share, annotate, rank and discover academic
literature (Fig. 1). We report on these initial experiments
in the remainder of this paper.
Personalization
Zotero, Mendeley and Papers are personal reference
management tools. However they do not include social
collaboration features. We implemented a personal facility
named MyCibo (Fig. 2), where users can save or
edit their references, personal pages, and blogs with the
ability to make them private or public. They can import
and export in EndNote tagged, XML, RIS, and BibTeX
formats and manually connect related publications. Social technologies applied to
bibliographies
Social Bookmarking
Most online libraries and bibliographies provide one way
learning, in that they provide services to the users, while
prohibiting users from contributing. This results in a
huge loss of knowledge and almost a freezing of storage
rather than active libraries. The current state of the art is
moving toward two way learning, where the users can
both benefit from the available knowledge and contribute
to it. (Hendry, D.G. et al. 2006b) mentioned an ‘amateur
bibliography’ that is collected by nonprofessionals
and falls short of the standards of a professional bibliography.
Although large amount of references could be
collected in a short span of time, resulting issues such as
redundancy, spam, phantom author names, and phantom
citations are not a good sign of scholarly research (Jacso,
P . 2008).
Unlike some general online reference management software
such as CiteULike and Connotea that are based on
the concept of non-moderated social bookmarking, we
considered the previous issues and the need for an accurate
bibliography. To get this done, light moderating
and authenticating of the users contributions to the CP bibliography is provided, aiming to reach the scholarly
moderated bibliography (Hendry, D.G. et al. 2006a).
Users were given ranks according to their scholarly or
contribution level. For example, well known scholars
got higher ranks so that they could contribute directly
without moderating their contributions. New users’ public
contributions will be entered into a queue waiting
for an approval from a moderator. Users who contribute
with relevant and accurate contributions would mean
that they are most likely experts in their area, and were
given points, which promote their ranking and editing
permissions. We believe this provides accuracy without
losing the benefits of collaboration. Fig. 3 shows how to
process the queued publications and Fig. 4 shows points
gained by an administrator after several entries. Editors
can revert to any previous revision in case there is need
(Fig. 5). We allowed the researchers to share and discover academic
literature without worrying about inaccurate bibliographic
data. They can discover what the warm topics
are in the research field and what is significant to other
researchers by viewing what other researchers read and
tag. Hence, they can know the related researchers with
similar interest that they can network with. Social collaboration
is also important for papers that are not available
electronically for various reasons and may loss their
presence in the research community.
Social Tagging
Del.icio.us and Digg are of the best and fastest growing
social bookmarking sites that use a folksonomy tagging.
However, inaccurate and misleading tags are common
in such open environments, which cannot be accepted
in research communities. This is a classic Web 2.0 problem:
it’s hard to aggregate the wisdom of the crowd without
aggregating their inexperience or madness as well
(Torkington, N. 2006).
We prevent these effects by using social tagging with
light moderation and users privileges upgrading. This
provides us with a better quality tags than we would get
if we just accepted all the beginners’ tags; these users
may want to contribute to the scholarly community initially
but may loss their interest later on. We allowed the
users to create their own tags and reuse the previously
entered tags by them or other users using the AJAX technology,
which allowed us to provide auto-complete tags
in real time.
Social review and comments
There are different types of comments: approving, disapproving,
or just summarizing the resource. We implemented
a feedback environment to build an active online
research community. It provides social reviews and comments
from the users where the authors can interact with
and answer their questions.
Multilanguage Capability
As digital libraries expand their audience and content
scope, there is an increasing need for resources and access
tools to those resources in a variety of languages
(Larson, R.R. et al., 2002). Even for polyglot users, there
is a preference to use maternal language interfaces in order
to accelerate searching and browsing process, preferring
the language of the interface to match the language
of the content as well (Keegan, T . and Cunningham, S.,
2008). Hence, the CP international scope requires the
inclusion of content and system functionalities in multiple
languages. Based on the statements presented, we
provided a translation capability for interface elements (localization) and for content (internationalization). We
analyzed different translation strategies such as using
Web content (Wang, J. et al., 2004), documents in multiple
languages (Nie, J.Y. et al., 1999), and some available
APIs. After testing common searching phrases and
sample texts in our content domain in three different
languages (English, Spanish, and Arabic), we decided
to use the Google AJAX Language API because of its
detection and translation capabilities.
Users can choose the preferred available language at any
moment while using the system. This choice will translate
the interface to that language and would select only
the content with that language. Bibliographic data can be
entered in a language and then translated to a new language
or linked to an existing bibliographic data or publications
in other languages (Fig. 6). Users’ comments
and annotations can be translated to other languages, allowing
users to comment and discuss in their preferred
language (Fig. 7). The testing outcomes showed us acceptable
translation results.
Ranking
Bibliography ranking has been used as a way to give users
a confident Top-N resource from the searching results.
A normal user only reads the first, second, or third
page of results. Citations and references have been used
as a way to rank bibliography resources (Larse, B. et al.,
2002, Larse, B. et al., 2006, Yang, K. et al. 2007). Citation-
based methods deal with complex issues such as biased
or self-citations, hard to detect positive or negative
citations, multiple citations formats difficult to handle by
computer programs, unfair consideration of new papers,
venues not considered. (Yan, Su, et al. 2007) propose a
seed-based measure (considering top-venues and venues’
authors relevance) and the browsing-based measure
(considers user’s behavior) to rank academic venues.
However, the authors-seed needs to be updated frequently
to reconsider new relevant authors. We used a hybrid
approach. We allowed the users with higher ranking to
rate the publications and retrieve the publications that
got a vast amount of approved reviews and comments
since that would mean that they are hot topics. Discussion and Future work
Our initial experimental results indicate that using an
online social collaboration would improve the quality,
quantity and usage of scholarly bibliography. Furthermore,
it would help in building bridges among the international
researchers and facilitate scholarly collaboration.
We intend to automate some portions of the moderating
process and evaluate the reviews and comments (positive
or negative) by identifying and interpreting annotations
patterns and semantic to give some relevance weight to
each source which would help also in the ranking.
Acknowledgements
This material is based upon work supported by the National
Science Foundation under Grant No. IIS-0534314.
References
CiteULike. Available at:
http://www.citeulike.org (Accessed October 2008).
Connotea. Available at: www.connotea.org (Accessed
October 2008).
Delicious. Available at: http://delicious.com/ (Accessed
August 2008).
Digg. Available at: http://digg.com/ (Accessed August
2008).
Drupal. Available at: http://drupal.org/ (Accessed April
2008).
Google AJAX Language API, Available at: http://code.
google.com/apis/ajaxlanguage/, (Accessed April 2008.)
Harnad, S. et al. (2008) The Access/Impact Problem and
the Green and Gold Roads to Open Access: An Update.
Serials review, 34 (1). pp. 36-40.
Hendry, D.G. et al. (2006a). Hotlist or Bibliography? A
Case of Genre on the Web, hicss,pp.51b, Proceedings
of the 39th Annual Hawaii International Conference on
System Sciences, p.51.2, January 04-07, 2006.
Hendry, D.G. et al. (2006b). Collaborative bibliography,
Information Processing and Management: an International
Journal, v.42 n.3, p.805-825, May 2006.
Hull, D. et al. (2008) Defrosting the Digital Library:
Bibliographic Tools for the Next Generation Web. PLoS
Comput Biol 4(10): e1000204. doi:10.1371/journal.
pcbi.1000204.
Jacso, P. (2008). Testing the Calculation of a Realistic hindex
in Google Scholar, Scopus, and Web of Science for
F. W. Lancaster. Library Trends 56.4 (2008): 784-815.
Project MUSE.
Keegan, T. and Cunningham, S. (2008). Language Preference
in a Bi-language Digital Library, Proceedings of
the 5th ACM/IEEE-CS joint conference on Digital libraries,
Denver Colorado, USA, 2005.
Larson, R.R. et al. (2002). Harvesting Translingual Vocabulary
Mappings for Multilingual Digital Libraries,
Proceedings of the 2nd ACM/IEEE-CS joint conference
on Digital libraries, Portland Oregon, USA, 2002.
Larse, B. et al. (2002). The Boomerang Effect: Retrieving
Scientific Documents via the Network of References
and Citations, Proceedings of the 25th annual international
ACM SIGIR conference on Research and development
in information retrieval, Tampere, Finland, 2002.
Larse, B. et al. (2006). Using Citations for Ranking in
Digital Libraries, Proceedings of the 6th ACM/IEEE-CS
joint conference on Digital libraries, Chapel Hill, NC,
USA, 2006.
Mendeley. Available at: http://www.mendeley.com/ (Accessed
October 2008).
Nie, J.Y. (1999). Cross-language Information Retrieval
Based on Parallel Texts and Automatic Mining of Parallel
Texts from the Web, Proceedings of the 22nd annual
international ACM SIGIR conference on Research and
development in information retrieval, Berkeley, California,
United States, Pages 74-81.1999. Papers. Available
at: http://mekentosj.com/papers/ (Accessed October
2008).
Surowiecki, J. (2004). The Wisdom of the Crowds: Why
the Many Are Smarter Than the Few and How Collective
Wisdom Shapes Business, Economies, Societies and Nations.
1st ed. New York: Doubleday.
Torkington, N. (2006). Digging the Madness of Crowds.
http://radar.oreilly.com/archives/2006/01/digging-themadness-
of-crowds.html. (Accessed April 2008).
Wang, J. et al. (2004). Translating Unknown Cross-
Lingual Queries in Digital Libraries Using a Web-based
Approach, Proceedings of the 4th ACM/IEEE-CS joint
conference on Digital libraries, Tucson, Arizona, USA;
2004. Yan, S. et al. (2007). Toward Alternative Measures for
Ranking Venues: A Case of Database Research Community,
Proceedings of the 7th ACM/IEEE-CS joint conference
on Digital libraries, Vancouver, BC, Canada, 2007.
Yang, K. et al. (2007). CiteSearch: Next-generation Citation
Analysis, Proceedings of the 7th ACM/IEEE-CS
joint conference on Digital libraries, Vancouver, British
Columbia, Canada, 2007.
ZOTERO. Available at: http://www.zotero.org/ (Accessed
October 2008).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None