AbstractTo aid musicological research, we extract named entities from a German musicological encyclopedia (MGG) with Named Entity Recognition, and link these entities in a social network by the entities that they co-occur with. We offer two network plots that reveal details about the history of musicology, of (a) musicologists that have a lexicon entry each, and (b) the composers that are mentioned in these entries.1 IntroductionOver the 18th and 19th century the academic discipline of musicology had become a central stage for negotiating the value of certain music and aesthetics. These negotiations have also led to the exclusion and deprecation of particular cultural elements, be they issues, music, or people. This discourse has been reflected in the encyclopedia ‘Die Musik in Geschichte und Gegenwart’ (Music in History and Present, MGG, comparable to the New Grove Dictionary of Music and Musicians). It contains 1282 biographical entries of musicologists, representing the Germany centered Western European discourse on musicological knowledge of the last 120 years.To analyze this text corpus at scale, we train named entity (NE) taggers based on BERT (Devlin et al., 2018) with the deepset.ai toolkit. To evaluate our NE taggers, we manually annotate texts from the encyclopedia.Previous manual annotations showed that we might expect data that enables us to analyse the professional network of musicologists exposed in the MGG Online (van Dyck-Hemming and Wald-Fuhrmann, 2019).This approach gives us an overview of socialgroupings, main actors and most negotiated composers (vanDyck-Hemming and Wald-Fuhrmann, 2016), (Latour, 2014). We offer two graphs. In Figure 2 we showthe network of musicologists with encylopedia entries, and in Figure 3 these musicologists with composersmentioned in the entries. If a certain node is not shown (like composers in Figure 2), then it is reduced toan edge. We calculate the importance of a certain actors by eigenvector centrality.2 Named EntitiesWe tune pre-trained BERT models (Devlin et al., 2018) with a sequence classification layer on theCoNLL-2003 Named Entity Dataset (Sang and De Meulder, 2003) and achieve a competitive .85 F1-score. See table 1 for an overview of the F1-scores of our BERT tagger, on conll03 itself and then tested on manual annotation of students.The students annotated around 3000 tokens with the classes Person (PER), Geopolitical Entity (GPE/LOC), Organizations (ORG), other (OTH), temporal expressions (TIME) and professional field, or discipline (FACH). We calculated Cohen kappabetween the annotators on document level. Over all classes, agreement ranges between .5 and .8. Whenremoving FACH, the agreement increases to .7 to .95, suggesting that future research should implementproper guidelines for this label.Our best model achieves a F1-score of 85% on conll03 itself, but on the manual annotation (withTIME, FACH, OTH removed), it only achieves 47% F1-macro.Especially ORG is not reliably detected. Precision figures are also somewhat low, much to our surprise,as manual inspection of the automatically annotated data showed exceptionally good precision.We therefore suspect that (a) the manual annotation (of spans) needs some improvement by better tailoring the guidelines to the used model, and (b) paying special attention to punctuation marks (commas, brackets) that are ubiquitous in encyclopedic articles. Since manual inspection revealed good performance of the model, we use it to extract entities for subsequent processing steps.2.2 Name StandardizationWe automatically removed first names and thus only used surnames. See Figure 1 for an overview ofthe cleaning pipeline. Unfortunately, only relying on last names leads to the conflation of some names, e.g., Hermann and Amalie Abert, the latter being the first female academic musicologist in Germany.2.3 Social Network ExtractionTo obtain a social network, we connect all names that occur with each other in a particular encyclopediaarticle. The network visualization was performed with the software Gephi (Bastian et al., 2009). Wefocused on the Largest Connected Component (Newman, 2010) and discarded nodes with a degree lessthan three. The Modularity (Blondel et al., 2008) measure was used to color-encode modules withinthe network, while the Eigenvector Centrality (Newman, 2010) embraced nodes of higher importance byincreasing their node size. Finally we applied the OpenOrd (Martin et al., 2011) or ForceAtlas2 (Jacomyet al., 2014) layout algorithm to draw the final graphics. In addition to a layout, those algorithms producea visual clustering that give further insights.3 Social networks of influential actors and geographical groupingFigure 2 illustrates a graph of musicologists that are connected by the entities that occur in theirbiography, using only names that occur in the title of an encyclopedic article. We find a central groupof researchers that minted German musicology. This group spans from Hermann and Amalie Abert overArnold Schering, to Hugo Riemann and Heinrich Kretzschmar. The likewise important Erich Hornbosteland Curt Sachs represent the field of systematic musicology, other researchers sharing their interests aregrouped around them. Guido Adler, who is loosely connected to many names, likely because he foundedthe first institute of musicology in Vienna, drifted off center to house his students that are also connectedto the center.We even could partly validate a study based on quantified historical data on Carl Dahlhaus and HansHeinrich Eggebrecht who represent the two main figures of German musicology from 1965 to 1995. Weobserve a clear distinction between both researchers and are able to identify a significantly bigger groupconnected to the younger Dahlhaus.3.1 Musicology and its issueIn additon to the musicologists, the data basis of Figure 3 also includes all names of composers. Here, the visualization shows a network of musicology and its main issues. Composers from the Bach-family or Mozart are the issuesof musicology as an academic discipline. So, the overwhelming dominance of names like ‘Mozart’,‘Bach’, ‘Beethoven’ and ‘Wagner’ represent the dominance of theresearch that musicologists have done on these composers and their music. Especially Bach resp. theBach-family seems to be the one issue deeply connected to the earliest and most important musicologistsnamely Johann Forkel, Philip Spitta, and Guido Adler. Moreover, there are numerous edges (identifiable via colour) linking this group with younger musicologists like Arnold Schering, Hermann Abert, Friedrich Blume and others who shaped musicology of the 1920s.4 ConclusionWe presented a preliminary study of a clearly defined corpus of prosopographical texts, from which we extracted Named Entities and linked these to visualize the social network of musicologists and their relations to certain issues: the composers. This approach is useful for historical research to confirm hypothesis that were draw from arduous manual work. Regarding musicology, we could confirm quantitatively what we expected and already knew from conventional historical research. The overwhelming dominance of a few composers also allows us to reflect about the knowledge standards that musicology has so far relied upon.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Carleton University, Université d'Ottawa (University of Ottawa)
Ottawa, Ontario, Canada
July 20, 2020 - July 25, 2020
475 works by 1078 authors indexed
Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.
Conference website: https://dh2020.adho.org/
Series: ADHO (15)