An on-line Laboratory for Linguistic Research - Complete works of Dante lemmatized

Mirko Tavoni; Elena Pierazzo; Letizia Leoncini; Paolo Ferrargina; Ivan Boscaino; Mirko Tavosanis

Authorship

1. Mirko Tavoni

Università di Pisa
2. Elena Pierazzo

King's College London, Università di Pisa, Université Grenoble Alpes
3. Letizia Leoncini

Università di Pisa
4. Paolo Ferrargina

Scuola Normale Superiore di Pisa
5. Ivan Boscaino

Università di Pisa
6. Mirko Tavosanis

Università di Pisa

Child sessions

The lemmatization and grammatical categorization of the Latin and Vernacular works of Dante, Mirko Tavoni, Elena Pierazzo, Letizia Leoncini, Paolo Ferrargina, Ivan Boscaino, Mirko Tavosanis
The Lemmatized Dante's works encoding, Mirko Tavoni, Elena Pierazzo, Letizia Leoncini, Paolo Ferrargina, Ivan Boscaino, Mirko Tavosanis
The Search Engine and the User Interface, Mirko Tavoni, Elena Pierazzo, Letizia Leoncini, Paolo Ferrargina, Ivan Boscaino, Mirko Tavosanis

Original URL

http://web.archive.org/web/20040903094216/http://www.hum.gu.se/allcach2004/AP/html/prop39.html

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Humanities Computing research group co-ordinated by Professor Mirko Tavoni at Pisa University has decided to post on the web the results and the research tools it used in its research projects.

For that reason, a web site has been created (http://dante.di.unipi.it/ricerca/). The web site collects the results and the research tools of many different research projects. The main project was the production of the lemmatized and grammatically marked up corpus of Dante Alighieri's complete Vernacular and Latin works, which will be further discussed in details. Other important projects are:

Correspondence encoding: starting from the pilot scheme of the Giacomo Puccini's correspondence corpus (realized in co-operation with the Centro Studi Giacomo Puccini), the project has been enlarged in order to include the correspondences of Vittorio Alfieri and Ugo Foscolo.
Digital editions of librettos: at present just the first act of Giacomo Puccini's Tosca is available. This project also has been realized in co-operation with the Centro Studi Giacomo Puccini.
Texts from Pisa and Ferrara-: two distinguished text collections about history, culture, art history and literature of Pisa and Ferrara.
Pinocchio Game: the experience of a group of PhD students of Pisa University that reconsider and fit the principles of James McGann's Ivanhoe Game.
All texts are available both for reading (many as hypertext) and for linguistic querying and are XML-TEI encoded. The user interface for managing and querying the texts is optimized for the same encoding language. The queries are performed by the XCDE Search Engine, a tool developed at Pisa University by Professor Paolo Ferragina.

Most of the texts available on the web site are the results of a semi-automatic transformation from the DBT encoding language. In particular, both the lemmatized Dante's works and the Ferrara and Pisa collections were created as a part of the CiBit project (Centro Interuniversitario Biblioteca Italiana Telematica, Interuniversitary Center for the Italian Telematic Library), and were later fully converted in XML-TEI encoding system. Both the texts and the tools (search engine and interface) that build up the web site are open source and are freely available for scientific and non-commercial purposes.

The site is offered also as a public resource and a laboratory for the linguistic research. Scholars interested on linguistic research can send their XML-TEI encoded texts to be processed by the research group's search engine. Scholars are also free to restrict access to their text to specific groups of users if they so wish.

From the web site it is possible to access a tools collection for the NLP (Natural Language Processing) of the Italian Language. These tools have been developed by ILC-CNR (Istituto di Linguistica Computazionale, Pisa) in collaboration with the Dept. of Linguistics (Computational Linguistics Section) of the Pisa University. The tools allow users to perform various levels of text processing, such as tokenization, lemmatization and morphological analysis, shallow parsing (chunking), dependency parsing, etc.

The session will focus on the lemmatized Dante's works.

The first paper will present the project from a linguistic point of view and will explain the scientific criteria of the linguistic analysis. The second paper will give an overview of the lemmatized texts encoding history; the encoding model will be also illustrated. The third paper will describe the functioning and the usage of the search engine and of the user interface.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2004

Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Conference website: http://web.archive.org/web/20040815075341/http://www.hum.gu.se/allcach2004/

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC

An on-line Laboratory for Linguistic Research - Complete works of Dante lemmatized

1. Mirko Tavoni

2. Elena Pierazzo

3. Letizia Leoncini

4. Paolo Ferrargina

5. Ivan Boscaino

6. Mirko Tavosanis

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2004