A European-Hindustani Dictionary? Reflections on Methods

paper, specified "short paper"
  1. 1. Anna Pytlowany

    University of Amsterdam

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This presentation is the first report on the project “Hindi Lexicography and the Cosmopolitan Cultural Encounter between Europe and India around 1700” from Uppsala University (UU). The primary goal of the project is to produce an online dictionary (Latin-Hindustani-French) on the basis of the unpublished
Thesaurus Linguae Indianæ by François-Marie de Tours (de Tours, 1704). The shortcomings of the Uppsala project will guide the design of an extended cross-linked online dictionary of early modern Hindustani based on little known wordlists and vocabularies compiled by European merchants and missionaries in the 17th c. India. The novelty of the approach resides in combining multilingual sources describing a foreign language to create a ‘pan-European perspective’, which may offer new comparative insights for the historical linguistics of target languages. If successful, this approach can be applied to other early modern vocabularies containing unique descriptions of non-European languages.

Preparing a digital edition of de Tours’
Thesaurus faced many challenges from its inception. The tool chosen for this project was Fieldworks Language Explorer (FLEx) created by SIL (https://software.sil.org/fieldworks/). This decision was motivated by the ease of access to infrastructure and know-how, as the same software is used across many linguistic projects at UU. FLEx is a tool developed for field linguists allowing them to create a semantically categorized glossary that can later be elaborated into a dictionary. However, using it for a multilingual historical text had many disadvantages that had to be mitigated, with a different level of success.

The original text consists of four columns across two pages: Latin, French, and Hindustani in both Devanagari script and its Romanisation. The nature of FLEx required a decision which language to prioritise for headwords. This was not a straightforward choice, as it had lexicographic implications going further (ultimately, Latin was chosen). The most challenging part was the transcription of the quite unusual form of Devanagari script. The only readily available option–normalising it to modern Hindi–however practical, meant information loss and reduced historical linguistic value of the resulting dictionary. Similarly, the complicated diacritics invented by the author to render the foreign sounds were arbitrarily simplified.
Since FLEx proved to be not particularly well suited for a historical dictionary, and especially, going forward with extending the project by including other early modern Hindustani wordlists and vocabularies (specified below), more sustainable and scalable solutions are required.
The main innovation of the proposed project lies in the ambition to create an integrated historical dictionary of Hindustani, in which multilingual unpublished sources are edited and linked. The majority of existing historical lexicographic projects mimetically retro-digitise printed dictionaries (e.g. Cologne Digital Sanskrit Dictionaries, http://www.sanskrit-lexicon.uni-koeln.de/), not taking full advantage of the possibilities the new digital environment offers.
By contrast, the information from the Hindustani manuscripts will be linked in all possible ways: between corresponding headwords in the respective works, but possibly also by external links to existing 19th-century dictionaries of Hindustani online (such as Digital Dictionaries of South Asia: http://dsal.uchicago.edu/dictionaries/). The user will, therefore, be able to see all the meanings of a Hindustani word from a ‘pan-European’ perspective.
Not only this extended project will offer new functionalities, but it will also deal with methodological issues the Uppsala project had to put away. Nevertheless, the challenges are still manifold.
This project tackles a few issues at the same time:
  Includes data from unpublished manuscripts

 Multilingual sources (Latin and French, Dutch/Flemish and Portuguese, Persian)

  Three different scripts (Latin, Perso-Arabic, and Devanagari)

  Special characters

5) Various arrangements (onomasiological, semasiological, alphabetical, by the grammatical category)
  Cross-linking of entries between vocabularies

The primary task will be to prepare the transcription of the works. All languages will require normalisation of spelling variants next to the added modern form.
The biggest challenge of these early modern texts is that the words in the target language were often written down from hearing, using the orthography of the writer’s native language and employing many diacritic innovations. And so, a Dutch author would write down the sound /u/ as <oe>,  use <g> for /x/, and <oo> for /o:/. At the same time, A French author would probably mark the sound /u/ as <ou>, use <g> for /ʒ/ or /ɡ/; an English author would use <oo> for /u/, but a Dutch person would read it as long /o:/. Understandably, it can be quite confusing for non-native speakers. To further complicate the matter, many values change over time.
To deal with this issue, a common solution needs to be found for entities, which then will be converted to Unicode, thus creating a set of special characters for the project. In the second step, the normalised entries will be analysed by phoneticians of individual languages and ‘translated’ into IPA, revealing the metalanguage-independent form of a Hindustani word. This will offer a solid ground for comparing and contrasting the Hindustani sounds as recorded by French, Dutch and Portuguese speakers.
Since Hindustani is the focus of the project, the main task will be the linking of the Hindustani glosses from all the dictionaries in one integrated database. To utilise the thematic arrangement of two works, establishing the ontology for the whole project will allow adding an additional layer of enrichment in the database with semantic categorisation. This will enable researchers to categorise, select and study certain types of vocabulary – a development not available in FLEx.
If successful, this approach can be applied to other early modern vocabularies, which were often created by both trained and amateur philologists in different source languages, and which today are unique and valuable descriptions of Asian languages.


Primary sources:

. (17c.).
Vocabularium Persico-Belgicum. Leiden University Library, LTK 589.

Anonymous. (17c.).
Vocabulary Portuguese-Hindustani-Persian. School of Oriental and African Studies, London, MS. 11952.

Ketelaar, Joan Josua
. (ca. 1689).
Instructie off Onderwijsinge der Hindoustanse en Persiaanse talen. Utrecht University Library, MS 1478.

de Tours, François-Marie. (1704).
Thesaurus Linguae Indianæ. Bibliothèque nationale de France, Paris, MS 840.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Series: ADHO (14)

Organizers: ADHO