Fractures and Cohesion: Using Systemic Functional Linguistics to Detect and Analyse Hate Speech in an Online Environment

paper, specified "short paper"
  1. 1. Deirdre Quinn

    An Foras Feasa - Maynooth University (National University of Ireland, Maynooth)

  2. 2. Keith Maycock

    School of Computing - National College of Ireland

  3. 3. John Keating

    An Foras Feasa - Maynooth University (National University of Ireland, Maynooth)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1. Introduction
Language acts as the lynchpin of cohesion maintaining electronic conversations across social networking sites. Analysis of that cohesion facilitates the detection of linguistic patterns that initiate and compound hate speech in online environments. This research reports on analysis of hate speech in videos and asynchronous conversation using Systemic Functional Linguistics (SFL) within social networking site. Focussing on the architecture of language and the influence of social context, SFL facilitates the analysis of language in its temporal and contextual use.1 In applying SFL to the chosen corpus of texts the research team are building reference dictionaries of offensive words and reference catalogues for the clausal structures in which these words are used. The team is exploring the detection of and analysis of hate speech across conversation, that is across texts within social networking sites (SNS). In accumulating data that allows the expansion of dictionaries and clausal catalogues, the team is enabling the building of an automated alert system that scans texts as they develop independently and in their engagement with other texts across time. Overall, this paper outlines the application of SFL to texts accrued from SNS that exhibit aspects of hate speech associated with dehumanisation, details the analysis of visualisations of hate speech within developing texts and demonstrates the building of an automated alert system using SFL to detect hate speech across texts.

1.1 Overview of Context
Raphael Almagor-Cohen describes hate speech as speech that intends to “injure, dehumanise, harass, intimidate, debase, degrade and victimize the targeted groups and to ferment insensitivity against them.”2

Speech acts have the capacity to carry out and compound the dehumanisation of people rendering them powerless within the confines of "fields of recognition" demarcated by the limits of language.3

Mining data retrieved from Youtube posts that dehumanise subjects provides the opportunity to analyse the overall ecology of texts as they develop online. The detection and analysis of latent dehumanisation is made possible by the application of SFL to this data thus empowering us to delineate the "fields of recognition" and the reinforcement of that field through subtle and explicit modes of language use.

1.2. Methodology
The corpus of texts used in this research are all drawn from Youtube and consist of the recording of a repeatable event in which

the videos' subjects, drug users, are subjected to dehumanising language and to an ongoing process of dehumanisation. The corpus consists of 20 videos and the associated metadata posted by unknown users of Youtube. Each video is a recording of a drug user in a public space in Ireland’s capital city, Dublin. A second set of videos of similar content recorded in Glasgow, Scotland is being used by the team for comparative purposes. The team have temporarily captured the videos and compiled transcripts of the audio and of the comments posted below. Both of these transcripts and the video are treated as objects that facilitate a users’ engagement with other users and with the different elements of the composite text. That is users may engage with the videos, with the asynchronous conversation that has grown in relation to the video or with media objects posted in relation to or response to these videos.

These media objects consist of mashups, memes, links to other videos and to other websites. Linguistic patterns within each type of engagement act as the creators of cohesion within the text’s overall development. The team have focused on lexical cohesion as a way of analysing how items relate to each other and build the texture of the text.4 Lexical cohesion creates the threads through which language choice manipulates the "finite nature of language as a semiotic system.5 We argue that is lexical cohesion that binds nodes of conversation with other media objects and which facilitate the development of the relationship between the composite elements of the text.

By compiling dictionaries of words associated with the dehumanising aspects of hate speech the team is building a framework enabling the initial identification of dehumanisation that provides opportunities for the analysis of the "textual processes of social life".6 Already marginalised, the recorded drug user is drawn into the connected city as a figure of disruption who is marginalised even further by the language choices evident in the analysed transcripts. The pounding vocabulary of the audio comment adds to their marginalisation as across each video they are described in dehumanising language. This language becomes part of the rearticulation and recirculation of hate speech. SFL as a system considers language as part of a process of instanstiation.7 That is it builds and develops texts brick by brick and interaction by interaction. In this respect, the text is considered a "complete linguistic interaction" that builds continuously.8 Here we use SFL to create markers for models that will be part of the automated system that detects how hate speech builds across composite texts, between elements of the texts and across the relationships established between particular posters.

The dictionary of offensive words that dehumanise the subjects of our corpus are drawn from the encoded transcripts of both the audio and the textual comments. Markers defining lexical cohesion facilitate the exploration of the users language as it engages with the overall field of the text, with the ideational expression of the text and the text's tenor. That is markers of lexical cohesion that bind immediate nodes of conversation with each other facilitate the immediate compounding of hate speech. It also enables challenges to posts promoting dehumanising hate speech. These challenges are acheived through the fractures in language induced through interruptions in lexical cohesion. Lexical cohesion draws on the temporality of the environment in which language is used to bind conversation in immediate response structures and across more developed response structures. Interruptions to lexical cohesion can also be use to introduce new ideational expressions that counteract the binds that previously placed limits on the text's field.

Further to this, the application of sentiment analysis (SA) along with SFL methods enables the visualisation of patterns of hate speech as a text develops and as patterns gather accumulative power. Thus visualisations of hate speech in a state of emergence empower moderators to detect less explicit hate speech that may otherwise go undetected. On going analysis of ‘emergent visualisations’ provides further opportunities to examine the participant’s use of language choice in conjunction with grammatical structures to bring cohesion to a text or to counteract the cohesion created through another user’s language choices and invocation of grammatical structures. In examining the emergent visualisations the research team draws on concepts of lexical cohesion and the clausal structures of sentences to detect linguistic patterns.

2. Demo and Results
By using the concepts surrounding lexical cohesion in conjunction with the dictionary of dehumanising words the team have identified, we are able to use our custom built programme to capture and analyse conversations that have developed as part of the "complete linguistic interaction" surrounding each media object in our corpus. An example of a captured conversation is shown below in Figure 1. The media object "Dublin Junkies" is shown at the centre of the visualisation. Each node of conversation representing a cluster of chat is represented in yellow. Our custom built programme allows us to identify whether these nodes are anaphoric or exophoric conversations. That is we can identify whether the nodes are related directly to the media object or not. Anaphoric conversations, those directly related to the video "Dublin Junkies", are demarcated by the colour blue in the second image, Figure 2. Exophoric nodes, those that do not relate to the video directly, are demarcated by the colour blue. In this second image the lines joining these nodes represent the result of our SA tool. Lines joining nodes that are red denote negative sentiment and those in green denote positive sentiment. The thicker the line the more negative or positive the sentiment.

Fig. 1: Visualising the ecology of online conversation

Fig. 2: Demarcating elements and relationships within developing texts

3. Conclusions
Analysis of 'emerging visualisations' points to the strong negative sentiment between exophoric nodes of conversation shown in Figure 3. The yellow lines in this figure delineate the reorganisation of the text according to lexical cohesion. This reorganisation demonstrates the capacity of both SFL and our programme to make linkages across texts as they undergo a process of instantiation.

Fig. 3: Lexical Cohesion and Sentiment Analysis

Theorist Judith Butler argues that to be called a name is “to be initiated into a temporal life of language that exceeds the prior purposes that animate that call”.9 Dehumanising language calls subjects into a disempowered temporality rendered increasingly damaging by the capacity for rearticulation and reproduction facilitated by online environments. The use of SFL to analyse the construction of temporal fields that may be sealed by language and grammatical choices compounding dehumanisation empowers moderators to detect hate speech in an online through longitudinal analysis. The updating of visualisations also allows moderators to identify the reach of particular posters across texts enabling a vertical and an horizontal analysis of the development of hate speech online.

1. Almagor-Cohen, Raphael (2011), Policing Hate and Bigotry on the Internet in Policy & Internet 3.3 7.

2. Butler, Judith (1997), Excitable Speech: A Politics of the Performative (London: Routledge) 7.

3. de los Angeles Gomez Gonzalez, Maria (2011), Lexical Cohesion in Multiparty Conversations in Language Sciences 33 168.

4. Eiggins, Suzanne (2004), An Introduction to Systemic Functional Linguistics (2nd Edition) (London: Continuum) 14

5. Eggins, Suzanne (2004), An Introduction to Systemic Functional Linguistics (2nd Edition) (London: Continuum) 2.

6. Halliday, M.A. K and Jonathan J. Webster (2009) (Eds) Continum Companion to Systemic Functional Linguistics (London: Continum International) 4.

7. Eggins, Suzanne (2004), An Introduction to Systemic Functional Linguistics (2nd Edition) (London: Continuum) .

8. Butler, Judith (1997), Excitable Speech: A Politics of the Performative (London: Routledge) 14.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from (needs to replace plaintext)

Conference website:

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO