The Japanese Small World of Words. Investigating meaning through a large-scale crowdsourcing study of word associations.

Maria Telegina; Simon De Deyne; Terry Joyce; Yusuke Miyao

Authorship

1. Maria Telegina

University of Tokyo
2. Simon De Deyne

University of Melbourne, Australia
3. Terry Joyce

Tama University, Japan
4. Yusuke Miyao

University of Tokyo

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Language is the bridge that connects cultures, but knowing whether a foreign language word has the same meaning and context as in their mother tongue is one of the most challenging tasks for foreign language learners and educators. Previous studies
(e.g. De Deyne, Verheyen and Storms, 2016) suggest that word association data contains semantic, cultural, and extra-linguistic information that underlies such common knowledge providing a unique method to compare meaning across cultures.

In contemporary linguistics and psychology, word association data have been rediscovered as a source of information for research on language and the mind. Word associations allow us to investigate a wide range of phenomena, including demographic-dependent differences in language use
(Garimella et al., 2017), lexical centrality and semantic similarity
et al., 2019)"},"properties":{"noteIndex":0},"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"}?>(De Deyne et al., 2019), language development, and age-dependent changes in concept
The construction of associated word lists is important for the elaboration of psychological and neuropsychological tasks and experiments. It remains unknown whether differences exist in the semantic associations of words from childhood to adulthood, possibly indicating important lexico-semantic developmental changes that influence neuropsychological assessment. The present study compared semantic word associations in children and adults in terms of forward associative strength and set size. The participants included 247 children from the third grade of elementary school, aged 7 to 11 years (M = 9.17 years, SD = 0.83 years), and 108 adults, aged 16 to 49 years (M = 22.17 years, SD = 6.04 years) from the study of Salles et al. (2008). The task consisted of the participants responding to the first word that came to mind (associate) with a meaning related to each of 87 words presented aloud (target). The children's responses had significantly higher forward associative strength between the target and most frequent associate word and a smaller response diversity index. Although the meaning and total set size did not significantly differ between groups, 40.2% of the targets had a large meaning set size in the children compared with only 10.3% in the adults. Among the most strongly associated pairs, 56.3% were equal between the sample groups. These results suggest that the selection of stimuli for the construction of verbal cognitive tasks should consider specific word association norms for different ages.","author":[{"dropping-particle":"","family":"Zortea","given":"Maxciel","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"de","family":"Salles","given":"Jerusa Fumagalli","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"Psychology & Neuroscience","id":"ITEM-2","issue":"1","issued":{"date-parts":[["2012"]]},"language":"por","page":"77-81","title":"Semantic word association: comparative data for Brazilian children and adults","type":"article-journal","volume":"5"},"uris":["http://www.mendeley.com/documents/?uuid=2192fd02-1bd5-4dff-aa98-b78fde7cc90c"]},{"id":"ITEM-3","itemData":{"DOI":"10.1590/S0102-79722014000100011","ISSN":"0102-7972","author":[{"dropping-particle":"","family":"Zortea","given":"Maxciel","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Menegola","given":"Bruno","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Villavicencio","given":"Aline","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"de","family":"Salles","given":"Jerusa Fumagalli","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"Psicologia: Reflexão e Crítica","id":"ITEM-3","issue":"1","issued":{"date-parts":[["2014","3"]]},"page":"90-99","publisher":"PRC","title":"Graph analysis of semantic word association among children, adults, and the elderly","type":"article-journal","volume":"27"},"uris":["http://www.mendeley.com/documents/?uuid=f8fa2420-37e5-31e0-b137-5250a861c1c1"]}],"mendeley":{"formattedCitation":"(Wulff et al., 2019; Zortea et al., 2014; Zortea & Salles, 2012)","manualFormatting":"(Wulff et al., 2019)","plainTextFormattedCitation":"(Wulff et al., 2019; Zortea et al., 2014; Zortea & Salles, 2012)","previouslyFormattedCitation":"(Zortea and Salles, 2012; Zortea et al., 2014; Wulff et al., 2019)"},"properties":{"noteIndex":0},"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"}?>(Wulff
et al., 2019).

There are three previously collected data sets of the Japanese word associations: Japanese word association norms collected by Umemoto
(1969), the Associative Concept Dictionary
(Okamoto & Ishizaki, 2001), and the Japanese Word Association Database
(JWAD; Joyce, 2005). The word association norms by Umemoto the stimuli set consisted of 210 words and responses were collected from 1000 respondents. The Associative Concept Dictionary consists of two data sets: one has 1656 stimuli, approximately 130,000 responses in total; another one has 1055 stimuli and approximately 250,000 responses. JWAD consists of 104,800 associative responses to 2099 stimuli. In all three cases, the volume of the data, demographics of the respondents, and information on word relations were limited due to the challenge of collecting data at scale. Moreover, methodological differences such as instructions complicate the comparison across languages that use slightly different procedures.

This paper presents a project aiming to create a large-scale Japanese associative database as part of the multilingual Small World of Words project (SWOW-JP). This project uses online crowdsourcing as the primary data collection. Information about the project is distributed via social media, and the word association collection is organized via the project’s web page. Crowdsourcing has allowed us to overcome volume and demographic limitations, already resulting in a dataset covering over 165,000 responses and more than 4000 participants. The average age of all participants was 37 (SD = 15 years). Participants were represented across all prefectures, with the top 3 coming from Tokyo (17%), Kanagawa (10%), and Aichi (5%). In a follow-up study, Japanese native-speaking respondents will be verifying relations between words via a citizen-science platform.
Japanese is one of 18 languages currently included as part of the international collaborative Small World of Words project. Datasets in several major world languages are now available (Dutch, +18,000 cues, English, +14,000 cues) or prepared for publication (Spanish +13,000 cues, Mandarin, +10,000 cues). The English and Dutch databases have already been downloaded by more than 3500 researchers. The simultaneous collection and comparable methods used across languages, including Asian languages such as Cantonese, Mandarin, and Korean, provide a unique resource for comparative analyses. Besides new psycholinguistic resources and demographic-aware semantic representations in Japanese, the SWOW-JP project will benefit from collected data in other languages, including logographic languages and the scientific lingua franca (English). We expect this to be instrumental in addressing theoretical questions about conceptual universality, providing a benchmark for NLP models, and supporting several applications, such as bilingual vocabulary learning.
The specifics of the Japanese writing system will also provide new opportunities and challenges compared to existing Indo-European datasets. One possibility is that the use of word forms across multiple writing systems might elicit different semantic representations (e.g., a word in hiragana vs kanji). The global network structure in Japanese semantic networks might be considerably different from other languages. This would be supported by previous findings in a cross-linguistic comparison of word associations across 12 languages
(Miron and Wolfe, 1964), showing that Japanese word associations are the most stereotypical (highest agreement among responses). Altogether, we believe the SWOW-JP project has the potential to revisit old and new questions systematically and comprehensively across many disciplines. Furthermore, beyond providing a source of data for linguists and psychologists, Japanese is a language spoken by over 120 million speakers and taught in over 136 countries, which opens several exciting avenues for multilingual comparative research, foreign language education, and other disciplines investigating the interaction between language thought and culture.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022

"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO

The Japanese Small World of Words. Investigating meaning through a large-scale crowdsourcing study of word associations.

1. Maria Telegina

2. Simon De Deyne

3. Terry Joyce

4. Yusuke Miyao

ADHO - 2022

"Responding to Asian Diversity"