Towards standards for lexicons and the linguistic annotation of texts.

Nicoletta Calzolari; Antonio Zampolli; Ulrich Heid

Authorship

1. Nicoletta Calzolari

Istituto di Linguistica Computazionale (ILC) (Institute for Computational Linguistics) - Consiglio Nazionale delle Ricerche (CNR)
2. Antonio Zampolli

Laboratoria di Linguistica Computazionale
3. Ulrich Heid

IMS-CL - Universität Stuttgart

Child sessions

From specifications to tagsets and coding guidelines: EAGLES morphosyntax annotations in lexicons and texts, Ulrich Heid
The Comlex Syntax Lexicon and the Eagles Subcategorization Standard, Ralph Grishman, Catherine Mcleod

Original URL

https://web.archive.org/web/20020713214911/http://www.cs.queensu.ca/achallc97/papers/s003.html

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Towards standards for lexicons and the linguistic annotation of texts.
Nicoletta Calzolari
Istituto di Linguistica Computazionale del CNR
glottolo@ilc.pi.cnr.it
Antonio Zampolli
Istituto di Linguistica Computazionale del CNR
eagles@ilc.pi.cnr.it
Ulrich Heid
IMS-CL, Universitaet Stuttgart
heid@ims.uni-stuttgart.de
Keywords: linguistic annotation of texts, standardization, guidelines

Motivation
As more and more machine readable text material becomes available, the importance of linguistic annotation of this material is in steady increase. This is true not only in the field of Natural Language Processing (NLP) and Language Engineering, but as well in Humanities Computing: for example, it is evident that linguistically informed free text search and text retrieval (especially if these are written in morphologically richer languages) is more precise (less noise) than search in texts not linguistically pre-analyzed. Linguistic annotation includes

the identification and tagging of word, sentence and paragraph boundaries;
the identification and tagging of the category (POS, word class) of word forms in running text;
the identification and tagging of morphological features (tense, number, person, etc.);
the identification and tagging of syntactic properties of predicates (syntactic subcategorization);
and many more.
Many corpus Linguistic Engineering companies and projects have designed their own proprietary annotation schemes; broadly available common schemes would have a number of advantages (easy availability, documentation, exchangeability, etc.). The workshop will discuss the need for standards for the above levels of linguistic description.

For the types of annotation listed above, the EAGLES project has attempted to prepare annotation schemes and operational tagging guidelines, to encode these as formal (or formally representable) specifications, and to validate them in a number of application experiments. EAGLES (Expert Advisory Groups on Linguistic Engineering Standards) is an expert group with contributors from both industry and academia from all over the EU aiming at the design of consensual standards for key areas of Linguistic Engineering.

Workshop objectives
The workshop aims at presenting and discussing recent and ongoing work towards standards for linguistic classification and annotation of word forms in texts and lexicons; the second main objective is to gather the feedback of the Humanities Computing scene with respect to the standardization work.

Specific objectives include the following:
- Identify and discuss the need for and the problems related with standards in the field of linguistic resources (in particular lexicons and corpora);

- Discuss questions of the interaction between lexicon and corpus: if there is a common underlying classification of linguistic material, at the levels indicated above, interesting new possibilities for `compound' resources are opened up: dynamic links from the lexicon to the corpus, corpus-based lexicon validation, new possibilities for linguistic acquisition, etc.

- Describe the EAGLES approach to the definition of standards proposals, the representations used, and the mechanisms available for validation, consistency checking etc.

- Describe the existing proposals for syntactic (and possibly semantic) annotation in texts and lexicons, based on efforts in EAGLES and in the COMLEX project at NYU;

- Discuss the EAGLES proposals from the point of view of `users': if a lexicon design project or a corpus analysis project is set up, does the use of annotation standards contribute to the efficiency of the project?

Confirmed workshop participants and their topics
The following participants have agreed to contribute:
Antonio ZAMPOLLI (Pisa): Linguistic Engineering Standards -- the domain of linguistic resources

Nicoletta CALZOLARI (Pisa): Standards for lexicons and corpora -- Areas, interaction between lexicon and corpus, current state of EAGLES

Ulrich HEID (Stuttgart): From specifications to tagsets and coding guidelines: EAGLES morphosyntax annotations in lexicons and texts

Antonio SANFILIPPO (Oxford): Standardizing word knowledge for NLP lexicons

Ralph GRISHMAN/Catherine McLEOD (New York): The Comlex Syntax Lexicon and the Eagles Subcategorization Standard

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1997

Hosted at Queen's University

Kingston, Ontario, Canada

June 3, 1997 - June 7, 1997

76 works by 119 authors indexed

Conference website: https://web.archive.org/web/20010105065100/http://www.cs.queensu.ca/achallc97/

Series: ACH/ALLC (9), ACH/ICCH (17), ALLC/EADH (24)

Organizers: ACH, ALLC

Towards standards for lexicons and the linguistic annotation of texts.

1. Nicoletta Calzolari

2. Antonio Zampolli

3. Ulrich Heid

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1997