Linked Data for the Humanities: methods and techniques

workshop / tutorial
Authorship
  1. 1. Enrico Daga

    Open University

  2. 2. Aldo Gangemi

    Università di Bologna (University of Bologna)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Background and Motivation
The primary goal of the digital humanities vision is the possibility to develop scholar studies at a large scale, where the sources of investigation can be reused among studies and arguments and methods to be developed and compared in a rigorous way. However, the underlying assumption is that sufficient data are available and that these data are properly described and distributed to the research community (Schöch, 2013).
The Linked Open Data (LOD) movement started within the Semantic Web (SW) research community with the objective of building a
Web of Data (Heath, 2011; Bizer, 2011). So far, the impact of Linked Data in the Library and Cultural Heritage domain has been significant and testified by large scale efforts such as the one of Europeana (Haslhofer, 2011). Related to that, LD has proven to be the framework for Open Science, linking knowledge with FAIR principles (Wilkinson, 2016) electing the Web URI as the natural method to
cite everything. Indeed, Linked Data (LD) seems to match the Digital Humanities (DH) agenda perfectly. However, at a closer look, the impact of Semantic Web research on the Humanities has been discontinuous.

On one hand, digital humanities projects emerged having data as focal point of the research and as enabler of novel approaches to humanistic enquiry. Notable examples are the LED Project (Adamou, 2014), drawing on LOD published by The Open University (Daga, 2016), and Pelagios (Isaksen, 2014), aiming at building a hub of geospatial data about the ancient world. However, the potential for data reuse seems to remain largely unexploited. In many DH projects, data is linked through user interfaces as hypertexts with minimal support for RDF and SPARQL. Too often, the
data side of LD is left out as future work.

On the other hand, Semantic Web researchers contribute tools to support scholars in dealing with LD (e.g. Adamou, 2014; Hyvönen, 2012; Hoekstra, 2016; Lodi, 2017). The variety of use cases and endeavours is well testified in the proceedings of the WHiSe workshop (Adamou, 2016), reaching the third edition this year.

Workshop on Humanities in the Semantic Web (WHiSe):
http://whise.kmi.open.ac.uk/

These span from cultural heritage (Bojars, 2016), historical datasets (Hoekstra, 2016), biographical data (Leskinen, 2017), the ancient world, music (Daquino, 2017) and musicology (Nurmikko-Fuller, 2016), on reflecting about methodological aspects (De Roure, 2016). Indeed, Humanities can be a land of opportunities for SW researchers, a space where the Semantic Web vision can be tested by challenging requirements coming from sophisticated, highly specialised domains of enquiry.

However, the interaction between the two communities has been occasional and as a result foundational techniques and methods developed by the SW community are still perceived as esoteric by many DH practitioners. In addition, more recent approaches have not been disseminated yet in the DH community and could contribute at enriching the toolkit available for leveraging Linked Data, from supporting the knowledge extraction phase (e.g. combining language processing with ontology engineering and deep learning) (Alam, 2017), to effectively build applications on top of SPARQL endpoints (Daga, 2015).
Adding semantics to DH has multiple effects: as an enabling technology (as exemplified above), as a vehicle to grow DH data as sharable and interoperable assets, and as an experimental platform to make innovative research in computer science by reverse engineering humanistic methods.
A plethora of use cases are emerging that can benefit of practical semantic platforms such as LOD and Knowledge Graphs (KG). A cartography of such cases can be built within the following areas (in an increasing degree of hybridisation between humanistic and computational techniques):

use or adaptation of computational tools for the collection, display or use of databases, corpora, collections;
creation of new exploration and discovery methods in humanities (both within works and collections, and in virtual reconstructions of places or artefacts);
implementation of software components supporting humanities;
creation of new knowledge (e.g., extraction of patterns from large corpora as in distant reading, availability of aggregated data for particular communities);
creation of new hybrid methods (e.g. cognitive computing, ontological engineering, defeasible logic, embodied or grounded semantics);
creative use of hybridisation to generate new works or collections (literary chatbots, painting morphisms).

This classification has been built out of a survey of DH projects from 40 academic centers, and shows the variety of data-oriented problems, approaches, and creative solutions: a “happy chaos” that can benefit from semantic methods.
With the above in mind, we propose a half-day tutorial on LD methods and techniques, having the following objectives:

To present the theoretical and technical foundations of Linked Data and introduce basic methods for data production and publishing to students, researchers, and practitioners.
To provide a reference collection of reusable tools to boost an effective adoption of LD in DH projects.
To showcase a set of innovative methods for extracting and linking data from texts.

Contents and target audience
The tutorial will be organised in three sessions: 1) Linked Data in a nutshell; 2) Producing and consuming Linked Data; 3) Hybrid methods for working with texts and LD. The content will be tuned to accommodate a wide range of participants, spanning from the student that is curious to hear more about LD to the humanist hacker that looks for a robust toolkit to apply to her research. One output of the tutorial will be an openly accessible and persistent registry of reusable resources for developing linked humanities applications.

http://purl.org
/ld4dh/
dh
2019.

Linked Data in a nutshell
This first session offers an overview of the basic notions and technologies behind Linked Data, from the adoption of URIs, the Resource Description Framework (RDF), Semantic Web ontologies, and SPARQL, to their use for representing data as a KG. A catalogue of exemplary LOD sources will be used in the hands-on session.

Producing and Consuming Linked Data
The second session will be dedicated to illustrate a number of case studies from humanities domains such as library science, art, and musicology. The objective of this session is to show how Linked Data works “Under the hood” and to enable participants to orient themselves when deciding to start a project based on Linked Data. The content includes basic recipes for generating RDF from pre-existing data sources such as Excel/CSV files or Relational Databases, for reusing existing vocabularies and ontology design patterns (ODP) (Gangemi, 2010), for discovering links with pre-existing third-party datasets, and for publishing the result as 5-star Linked Data. We will showcase the standard toolkit for consuming linked data as well as more advanced approaches that make it easier for developers to interact with SPARQL endpoints - e.g. BASIL (Daga, 2015).

Hybrid methods for working with texts and LD
The last session is dedicated to hybrid methods for the identification, extraction, and linkage of data from texts. We will showcase methods and tools that leverage Semantic Technologies for discovering entities in texts (Entity Linking), and for automatically creating knowledge graphs from texts (Knowledge Extraction), and how LD can be exploited for characterizing the content of texts for similarity analysis and content recommendation. Off-the-shelf tools such as DBpedia Spotlight (Mendes, 2011), Tagme (Ferragina, 2014) and FRED (Gangemi, 2017) will be introduced to showcase the potential of hybrid methods.

About the Organisers

Enrico Daga

Enrico Daga
has a PhD in Artificial Intelligence and has carried out research on Web Semantics and Ontology Engineering since 2006, first at the

Italian National Research Council (CNR)
and then at the

Knowledge Media Institute
of The Open University in the UK, where he leads the OU Linked Data initiative (

data.open.ac.uk
). He has played key roles in R&D projects related to the development of intelligent systems for Ontology Engineering (

NeOn
) and Smart Cities (

MK:Smart
). Currently, he is exploring the application of knowledge-based methods to support scholarship in the humanities (e.g. the

LED project
). A former student of Music and Performing Arts (University RomaTRE), Enrico is founder and chair of the

WHiSe Workshop on Humanities in the Semantic Web
.

Aldo Gangemi

Aldo Gangemi is full professor at
University of Bologna, and associate researcher at Italian National Research Council, Rome. He has co-founded the
Semantic Technology Lab (STLab) at
ISTC-CNR. His research focuses on Semantic Technologies as an integration of methods from Knowledge Engineering, the Semantic Web, Linked Data, Cognitive Science, and Natural Language Processing. He has worked in many different domains, including Cultural Heritage, where he has designed ontologies and linked data (Luoghi della Cultura, ICCD’s ArCo) for the Italian Ministry of Cultural Heritage. He has published more than
250 papers in international peer-reviewed journals, conferences and books, and seats as EB member of international journals. He has worked in several EU projects related to LOD such as WonderWeb, Metokis, NeOn, and IKS.

Bibliography

Adamou, Alessandro; d'Aquin, Mathieu; Barlow, Helen and Brown, Simon (2014). LED: curated and crowdsourced linked data on music listening experiences. In: 
Proceedings of the ISWC 2014 Posters & Demonstrations Track, CEUR Workshop Proceedings, CEUR-WS.org, pp. 93–96.

Adamou, Alessandro, Enrico Daga, and Leif Isaksen (2016). Proceedings of the First Workshop on Humanities in the Semantic Web (WHiSe). CEUR Workshop Proceedings
http://ceur-ws.org/Vol-1608/

Alam, M., Recupero, D. R., Mongiovi, M., Gangemi, A., & Ristoski, P. (2017). Event-based knowledge reconciliation using frame embeddings and frame similarity. 
Knowledge-Based Systems, 
135, 192-203.

Bizer, C., Heath, T., & Berners-Lee, T. (2011). Linked data: The story so far. In 
Semantic services, interoperability and web applications: emerging concepts (pp. 205-227). IGI Global.

Bojars, U. (2016). Case study: towards a linked digital collection of Latvian cultural heritage. 
In: Proceedings of the First Workshop on Humanities in the Semantic Web (WHiSe). CEUR Workshop Proceedings

http://ceur-ws.org/Vol-1608/
.

Daga, E., Panziera, L., & Pedrinaci, C. (2015). A BASILar approach for building web APIs on top of SPARQL endpoints. In 
CEUR Workshop Proceedings (Vol. 1359, pp. 22-32).

Daga, E., d’Aquin, M., Adamou, A., & Brown, S. (2016). The open university linked data –data.open.ac.uk. 
Semantic Web, 
7(2), 183-191.

Daquino, M., Daga, E., d'Aquin, M., Gangemi, A., Holland, S., Laney, R., Penuela, A.M., and Mulholland, P. (2017). Characterizing the landscape of musical data on the Web: State of the art and challenges. In
Proceedings of the Second Workshop on Humanities in the Semantic Web (WHiSe). CEUR Workshop Proceedings

http://ceur-ws.org/Vol-2014/
.

De Roure, D., Willcox, P., & Abdul-Rahman, A. (2016). On the description of process in digital scholarship.
In: Proceedings of the First Workshop on Humanities in the Semantic Web (WHiSe). CEUR Workshop Proceedings

http://ceur-ws.org/Vol-1608/
.

Piccinno, F., & Ferragina, P. (2014, July). From TagME to WAT: a new entity annotator. In 
Proceedings of the first international workshop on Entity recognition & disambiguation (pp. 55-62). ACM.

Gangemi, A., & Presutti, V. (2009). Ontology design patterns. In 
Handbook on ontologies (pp. 221-243). Springer, Berlin, Heidelberg.

Gangemi, A., Presutti, V., Reforgiato Recupero, D., Nuzzolese, A. G., Draicchio, F., & Mongiovì, M. (2017). Semantic web machine reading with FRED. 
Semantic Web, 
8(6), 873-893.

Heath, T., & Bizer, C. (2011). Linked data: Evolving the web into a global data space. 
Synthesis lectures on the semantic web: theory and technology, 
1(1), 1-136.

Hoekstra, R., Merono-Penuela, A., Dentler, K., Rijpma, A., Zijdeman, R., & Zandhuis, I. (2016, May). An ecosystem for linked humanities data. In 
European Semantic Web Conference (pp. 425-440). Springer, Cham.

Haslhofer, B., & Isaac, A. (2011, September). data.europeana.eu: The europeana linked open data pilot. In 
International Conference on Dublin Core and Metadata Applications (pp. 94-104).

Hyvönen, E. (2012). Publishing and using cultural heritage linked data on the semantic web. 
Synthesis Lectures on the Semantic Web: Theory and Technology, 
2(1), 1-159.

Isaksen, L., Simon, R., Barker, E. T., & de Soto Cañamares, P. (2014, June). Pelagios and the emerging graph of ancient world data. In 
Proceedings of the 2014 ACM conference on Web science (pp. 197-201). ACM.

Leskinen, P., Tuominen, J., Heino, E., & Hyvönen, E. (2017, October). An Ontology and Data Infrastructure for Publishing and Using Biographical Linked Data. In
Proceedings of the First Workshop on Humanities in the Semantic Web (WHiSe). CEUR Workshop Proceedings

http://ceur-ws.org/Vol-1608/
.

Lodi, G., Asprino, L., Nuzzolese, A. G., Presutti, V., Gangemi, A., Recupero, D. R., ... & Orsini, A. (2017). Semantic Web for cultural heritage Valorisation. In 
Data Analytics in Digital Humanities (pp. 3-37). Springer, Cham.

Mendes, P. N., Jakob, M., García-Silva, A., & Bizer, C. (2011, September). DBpedia spotlight: shedding light on the web of documents. In 
Proceedings of the 7th international conference on semantic systems (pp. 1-8). ACM.

Nurmikko-Fuller, T., & Page, K. R. (2016). A linked research network that is Transforming Musicology. In
Proceedings of the Second Workshop on Humanities in the Semantic Web (WHiSe). CEUR Workshop Proceedings

http://ceur-ws.org/Vol-2014/
(pp. 73-78).

Schöch, C. (2013). Big? smart? clean? messy? Data in the humanities. 
Journal of digital humanities, 
2(3), 2-13.

Volz, J., Bizer, C., Gaedke, M., & Kobilarov, G. (2009). Silk-a link discovery framework for the web of data. 
LDOW, 
538.

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., ... & Bouwman, J. (2016). The FAIR Guiding Principles for scientific data management and stewardship. 
Scientific data, 
3.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.