Recognizing Hate with NLP: The Teaching Experience of the #DeactivHate Lab in Italian High Schools
p. 155-161
Résumé
The possibility of raising awareness about misbehaviour online, such as hate speech, especially in young generations could help society to reduce their impact, and thus, their consequences. The Computer Science Department of the University of Turin has designed various technologies that support educational projects and activities in this perspective. We implemented an annotation platform for Italian tweets employed in a laboratory called #DeactivHate, specifically designed for secondary school students. The laboratory aims at countering hateful phenomena online and making students aware of technologies that they use on a daily basis. We describe our teaching experience in high schools and the usefulness of the technologies and activities tested.
Remerciements
The work of S. Frenda, A. T. Cignarella and M. Lai has been funded under the national project Piano Lauree Scientifiche (PLS) 2019/20 as part of the activities of Computer Science Department, School of Science of Nature, University of Turin. The authors would like to extend a special thanks to the school ‘Convitto Nazionale Umberto I’, and in particular, to Professor Simona Ventura for her availability and her collaboration in this adventure with #DeactivHate.
Texte intégral
1. Introduction
1Recently, the presence of digital technologies in our lives has grown enormously, with a strong impact on our daily lives. Digital spaces and social media have become a privileged channel for communication, information and socialization, frequented by millions of people at the same time. Along with the new relational opportunities and access to knowledge, even misbehaviour have acquired new visibility and virality, such as hate speech. In spite of a causal link between hate speech and crime is hard to demonstrate, the risk of offences and effects on psychological and physical well-being of the victims are clear in psychological and social studies (Nadal et al. 2014; Fulper et al. 2014). The extreme consequence of these effects might be suicide, especially considering the adolescents, as suggested by recent studies investigating the link between cyberbullying and suicidal behaviors of U.S. youth (Nikolaou 2017). To prevent such scenarios, few awareness-raising projects in schools are activated by NGOs in Italy, such as Amnesty International1 or Cifa ONLUS2.
2The Commissione Orientamento e Informatica nelle scuole3 supports a manifold of activities with the main goal of creating a link between schools and academia, also in the context of the national project Piano Lauree Scientifiche (PLS). The members of the CCC (Content-Centered Computing) group of the Computer Science Department of the University of Turin, active in the investigation of hate speech online4, have led and participated in several hate-speech-related projects, including “Contro l’odio”5 (Capozzi et al. 2020) a joint work with non-profit entities and University of Bari that aims at monitoring hate speech against minorities in Italy. Within the current experience, we created a data annotation platform specifically dedicated to support educational activities and aimed at reflecting on the importance of a conscientious communication. In this perspective, the idea of #Deactivate takes hold. This laboratory, addressed at students of secondary schools, is articulated in three main modules with the purpose of:
raising awareness about this social problem, encouraging the reflection on microaggressions, hate speech, stereotypes, prejudices;
stimulating the so-called computational thinking and the study of linguistic elements that are exploited by users to offend or to express hate against a victim online (hashtags, emoticons, or figures of speech);
introducing high schoolers to how tools based on NLP work to incentive a more conscious use of technology.
3We designed a series of educational activities that include: analysis of the online problem by means of an investigation on own social networks personal profiles; linguistic analysis of the hateful messages during the annotation of tweets on the “Contro l’odio” annotation platform; manual identification of hate speech in Italian texts playing the role of ‘being an automatic classifier’; translation of this task in a real automatic task, coding two types of classifiers in Python. These activities, delivered online due to the pandemic restrictions, have been distributed in 5 meetings (lasting 2 hours each) for each class, between April and June 2021, for a total of 10 hours per class.
2. Related Work
4A popular workshop series on the topic of “Teaching NLP” has been recently held on its fifth edition at NAACL-HLT 2021 (Jurgens et al. 2021), where the participants discussed and shared experiences on a variety of important issues such as: teaching guidelines, teaching strategies, adapting to different student audiences, resources for assignments, and course or program design. The main lesson learned has been that of highlighting the importance of creating materials describing NLP, not only for learners at a university/college level, but also for those learners who are younger and have diverse educational backgrounds. On this regard, a great inspiration for starting to work with schools in Italy derives from the experience of Sprugnoli et al. (2018), where the authors – although with different goal in mind than ours – started a project involving NLP and pupils from Italian schools, aged 12-13. That experience was chiefly dedicated to the study of cyberbullying among pre-teens and the creation of a corpus of WhatsApp threads in the context of the CybeRbullying EffEcts Prevention activities (CREEP) project. Our idea of starting a project that could bring NLP among high schoolers and that, at the same time, could introduce the themes of hate speech, microaggressions, and discriminations by eliciting personal experiences and students’ opinions, is somehow in continuity with that experience.
5A second work of great relevance for the creation of our experience, has been the reading of Pannitto et al. (2021), in which the authors point out, for the first time, the fact that no high school curricula in Italy includes any (computational) linguistics education and that the lack of this kind of exposure makes choosing computational linguistics as a university degree unlikely. Furthermore, the authors highlight that NLP is, indeed, at the core of many tools young people use in their everyday life, and having almost zero knowledge of this field makes the use of such tools less responsible than it could be. The authors have been the first to create a dedicated workshop for Italian aimed at raising awareness of Italian students aged between 13 and 18 years regarding the subject of NLP (Messina et al. 2021).
6Additionally, the idea of creating some playful and meaningful activities regarding NLP and the themes of hate speech for high schoolers, are in line with the concept of ‘gamification’, which lately has been applied to many linguistic annotation tasks, as an alternative to crowdsourcing platforms to collect annotated data in an inexpensive way (Bonetti and Tonelli 2020), such as our “Contro l’odio” annotation platform.
3. #DeactivHate
7The aims of #DeactivHate are: 1) raising awareness about misbehaviour online, such as hate speech, eliciting also personal experiences, 2) stimulating computational thinking and linguistic observation of hateful messages, and 3) encouraging a conscious use of technologies discovering how they work. To reach these objectives we articulated three modules as described below.
3.1 Hate Speech: Introduction
8The first module is aimed at introducing at introducing a definition of hate speech to students. Hate speech is often mistaken for a generic insult rather than a specific phenomenon “connected with hatred of members of groups or classes of persons identified by cer-tain ascriptive characteristics (e.g., race, ethnicity, nationality)” (Brown, 2015).
9The session started with an icebreaking activity in which students presented themselves through an image found online, depicting an aspect of their identity (see Figure 1). We then asked them to tell whether they were ever attacked or stigmatized for this characteristic.
10In this way, we guided the class in drawing a distinction between non-ascriptive identity traits (e.g., political belief, style of dressing) and ascriptive6 ones (e.g., ethnicity, sexual orientation, skin colour) (Reskin, 2005). The idea behind this activity is twofold: i) it links issues such as hate speech and racial microaggression (Sue, 2010) to students’ lives; ii) it helps distinguishing the spread-ing of discriminatory contents7 from generic insults. The module ended with an assignment: students had to find at least one public figure who had been a victim of online discrimination, providing one or more hateful messages as an example, and a counter-narrative response..
3.2 “If I Were a Classifier...”
11The second module is organized in two meetings and focuses on the importance of manually annotated corpora for online hate speech detection and what are the peculiarities of hateful messages.
12Within the first meeting, each student presented the found messages and try to define the type of attack and the linguistic characteristics of the text that make it hateful or a counter-narrative. The variety of examples led to the introduction of a deeper taxonomy of discrimination (e.g., misogyny, homophobia, sexism, etc...). As expected, the following group discussion brought out a considerable subjectivity in perceiving these phenomena, thus highlighting the need of adopting a shared annotation schema to identify hate speech in messages.
13After a brief introduction on what corpora are and how they are used in new technologies, students have been involved in an annotation task of hate speech, asking them to evaluate at least 30 tweets.
14For this purpose, we created the data annotation platform8 within the “Contro l’odio” project. This web application, built using PHP, MySQL, and JavaScript, preserves the student’s annotation history by using a passwordless authentication link sent to the email choose during the login. This method has the double advantage of not requiring the student to register to the platform and of preserving ourselves to save the student’s email or other personal data. It then ensures the annotation anonymity and satisfies the requirements of General Data Protection Regulation (GDPR), as a desired consequence.
15The home page of the web application consists of a dashboard that provides the annotation guideline and shows basic information about the student’s activity. Indeed, the student could know the number of sessions they completed (each session consists of annotating 15 tweets) and the level of agreement (expressed in percentage) between their annotation and the annotation performed by the automatic model realized in the “Contro l’odio” project. Gamifying the task through this comparison, we provide the basis for a discussion about the fallibility of automatic systems. Furthermore, we also allow the student to compare their annotation with the annotation of their classmates in order to introduce the measures of annotator agreement. When a session starts, the student could annotate the level of hatefulness of a tweet through a 7 square scale filled with a color scale from Watusi to Sangria as shown in Figure 2. Two additional squares, respectively filled with White and Mid Gray, allow to state the absence of hate or to consider off-topic the content of the tweet. Finally, three toggle switches (on/off button) were added to check the presence of ‘irony/sarcasm/humor’, ‘offensiveness’, and ‘stereotype’, giving them the possibility to reflect about the ways in which users spread hate online.
16During the annotation task, students were asked to fill a shared spreadsheet with the tweets that impressed them the most for its offensiveness, for its humorous intention, or the most difficult to annotate. By discussing with them annotation results, we introduced the latest core concept of the module: the agreement. We presented some metrics that are typically adopted to calculate it among annotators and outlined some good practice recently emerged in Corpus Linguistics, such as ensuring the involvement of minorities in corpora development in order to avoid biases (Basile 2020).
3.3 My First Classifier
17In this module the main idea is to stimulate computational thinking by translating linguistic observations coming from the annotation procedure in a proper computational task. The activity of annotation has, indeed, given the opportunity of reflecting on how users tend to verbally express hate online, and on how minorities are represented through stereotypes. To incentivize this transition, we proposed two activities:
to mark in each tweet the textual span that could make a classifier aware of the presence of hate speech creating a list of word n-grams;
to develop two automatic classifiers (supervised and unsupervised) exploiting the list of word n-grams.
18Before starting with the first activity, we asked students to motivate their choice of the tweets selected during the previous exercise. Some tweets triggered a discussions on what should be considered hate speech or not, and the doubts were later solved by looking at the provided definitions of hate speech and at the annotation guidelines. The most controversial tweets report aggressive events or racial propositions; and, for this reason, they were perceived as hurtful by the majority of the students:
(i) Autobus per i bianchi e altri per i migranti. Non si parla dell’apertheid del Sudafrica né del periodo di segregazione negli Stati Uniti, ma di una proposta della Lega per la provincia di Bergamo. L’Italia non è un paese razzista ma nel 2020 questo è ciò di cui si discute. URL9
19Others triggered interesting linguistic reflections, such as:
(ii) Peccato che non sbarcano povere famiglie africane, ma solo mafia nigeriana, ex galeotti tunisini, stupratori senegalesi, terroristi dell’Isis dalla libia, tutti criminali robusti 1.80 di altezza, pronti a spacciare droga, violentare le nostre donne, cannibali e assassini.10
20In these, the students retrieved specific figures of speech such as sarcasm, rhetorical questions and analogies, and also strong words that reflect the social biases towards the minorities. In activity A, all the words and expressions that could make the message hurtful have been collected in a list of n-grams of words called our_lexicon (Table 1). Following, the items of such list have been exploited by the classifiers to predict if a tweet contains hate speech or not.
Table 1: Examples from our_lexicon
uni-gram | risorse, sporchi, pacchia, schifo, invasione, spacciare11 |
n-grams | porti chiusi, cacciarli via, difesa della patria12 |
21For activity B, we created an interactive Python notebook using the Colaboratory platform provided by Google, as a similar initiative had successfully been carried out by with a similar educational tool. To allow the students to use the notebook in spite of their computer skills, we elaborated some guidelines explaining even how to create a folder in Google Drive and how to import all the necessary materials inside of it. Among the required materials, we prepared the dataset using the tweets previously annotated by the students.
22We proposed two types of classifiers:
unsupervised classifier based on the list our_lexicon for which if one of the selected grams are inside the text, the text is predicted as hateful;
supervised classifier based on Support Vector Machine algorithm using the list our_lexicon as main feature of the classification task.
23The coding of the first classifier allowed students to gain confidence with some basics of Python; whereas the second one introduced them to core of new technologies based on machine learning (see Figure 3). At the end of the activity, we observed together the performances of automatic systems and analyzed some of the tweets that were wrongly classified. This final step helped students to reflect on the limitations of machines and the important role of the linguistics in language-related technologies.
4. What We Learned
24Due to pandemic restrictions, we thought the entire laboratory through remote modality (DAD)13 between April and June 2021 to 2 classes of one secondary school of Turin, with students aged 16-20. As described above, various resources and tools have been used (and created ex novo) to bring forward the educational activities in distant teaching mode. However, we believe that the same activities/materials could be proposed even for lessons in presentia exploiting the computer rooms of the schools.
25For each class, we organized the activities of the three modules in 5 meetings of about 2 hours. Despite the shortness of the laboratory, we found that realizing specific activities for each session helped us manage efficiently the available time. We resorted to web applications to make up for the different devices and operating systems used by the students at their homes. And, in particular, we used Google Meet, as it offers interactive tools such as virtual blackboard, and Moodle, a learning platform provided by the University of Turin that gave us the possibility to organize our activities making available the necessary materials to students. Moreover, each meeting was supported by the use of slides for having visual and descriptive support. The classes assisted in this short period were composed of a total of 35 adolescents, coming from different countries. From the first meeting they showed a general interest in the treated subject, and we were surprised especially by the profoundness of some observations raised during the discussions. The students, indeed, were encouraged to share their opinions, doubts, and perspectives. These discussions made clear that the students face these problems related to technology and communication every day, sometimes suffering even the consequences. Hate speech is, indeed, a very sensitive issue and the perception of what is abusive or not, depends on the cultural back-ground of each student. This fact, on the one side stimulated the debates, however, on the other side, it made it difficult for us to find the ideal way to share complex concepts and manage specific situations.
26At the end of the laboratory, we provided a survey in order to collect the impressions and the opinions of students. Analyzing these surveys, we noticed that the majority of students considered interesting the content of #DeactivHate, but it appears clear that the format online of the laboratory was perceived from students less interactive and fluent, due especially to technical problems when a part of students were in class and other part at home14. From our perspective, we noticed an interesting difference between younger and older students. The older were more active during the activities and discussions than the younger. Moreover, we thought that the number of students affected the flow of the debates, especially in the DAD context. We expect that in presentia the proposed activities could have a better impact facilitating the interaction.
5. Conclusion
27#DeactivHate represents for Italian high schoolers a first step towards the introduction to subjects such as Linguistics and NLP, that are, for the most part, unknown in Italian high schools, in spite of their relevance in everyday technology. Indeed, this kind of laboratory reveal which are the multiple applications of the Computer Science degree course at a university level, and even of Linguistics, commonly considered as opening fewer employment opportunities. Looking at the future, we would like to enhance the proposed activities in order to make them more interactive even in an online context (such as the DAD) following the example of Hiippala (2021).
28A final remark needs to be made regarding the lack of evaluative strategies that could allow us to understand the impact of #DeactivHate in students’ online behaviors or their knowledge of technologies. Therefore, following the example of and Bioglio et al. (2018) and Athanasiades et al.(2015), in the next editions we have planned to employ: surveys before and after the intervention to evaluate the activity online of the students and their experiences about misbehavior (caused or suffered); and interviews to teachers after the conclusion of the laboratory to understand if some changes were perceived with respect to the class group. Future activities will integrate also basic evaluations to assess the degree of learning with respect to the contents of the course, such as computational thinking, annotation methodologies, automatic text processing, as well as a final evalua-tion of the proposed teaching activities collecting the personal impressions of the students.
29In addition, to validate also the impact of #DeactivHate in the society and, in particular, in the city context we think to measure the detection of the amount of hateful message online by means of monitoring platforms, such as “Contro l’odio” map15.
Bibliographie
Des DOI sont automatiquement ajoutés aux références bibliographiques par Bilbo, l’outil d’annotation bibliographique d’OpenEdition. Ces références bibliographiques peuvent être téléchargées dans les formats APA, Chicago et MLA.
Format
- APA
- Chicago
- MLA
Christina Athanasiades, Harris Kamariotis, Anastasia Psalti, Anna C Baldry, and Anna Sorrentino. 2015. Internet use and cyberbullying among adolescent students in Greece: the “Tabby” project. Hellenic Journal of Psychology, 12(1):14–39.
Valerio Basile. 2020. “It’s the End of the Gold Standard as We Know It.” In International Conference of the Italian Association for Artificial Intelligence, 441–53. Springer.
Federico Bonetti and Sara Tonelli. 2020. “A 3D Role-Playing Game for Abusive Language Annotation.” In Workshop on Games and Natural Language Processing, 39–43. European Language Resources Association.
Alexander Brown. 2015. Hate speech law: a philosophical examination. Routledge.
10.4324/9781315714899 :Arthur TE Capozzi, Mirko Lai, Valerio Basile, Fabio Poletto, Manuela Sanguinetti, Cristina Bosco, Viviana Patti, et al. 2020. “‘Contro L’Odio’: A Platform for Detecting, Monitoring and Visualizing Hate Speech against Immigrants in Italian Social Media.” IJCoL. Italian Journal of Computational Linguistics 6 (6-1): 77–97.
Rachael Fulper, Giovanni Luca Ciampaglia, Emilio Ferrara, Y Ahn, Alessandro Flammini, Filippo Menczer, Bryce Lewis, and Kehontas Rowe. 2014. “Misogynistic language on Twitter and sexual violence.” In Proceedings of the Acm Web Science Workshop on Chasm.
Tuomo Hiippala. 2021. Applied Language Technology: NLP for the Humanities. In Proceedings of the Fifth Workshop on Teaching NLP, pages 46–48, Online, June. Association for Computational Linguistics.
David Jurgens, Varada Kolhatkar, Lucy Li, Margot Mieskes, and Ted Pedersen, eds. 2021. Proceedings of the Fifth Workshop on Teaching NLP. Association for Computational Linguistics.
Lucio Messina, Lucia Busso, Claudia Roberta Combei, Alessio Miaschi, Ludovica Pannitto, Gabriele Sarti, and Malvina Nissim. 2021. “A Dissemination Workshop for Introducing Young Italian Students to NLP.” In Proceedings of the Fifth Workshop on Teaching Nlp, 52–54. Online: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/2021.teachingnlp-1.7.
10.18653/v1/2021.teachingnlp-1.7 :Kevin L. Nadal, Katie E Griffin, Yinglee Wong, Sahran Hamit, and Morgan Rasmus. 2014. “The impact of racial microaggressions on mental health: Counseling implications for clients of color.” Journal of Counseling & Development 92 (1): 57–66.
Dimitrios Nikolaou. 2017. “Does cyberbullying impact youth suicidal behaviors?” Journal of Health Economics 56: 30–46.
Ludovica Pannitto, Lucia Busso, Claudia Roberta Combei, Lucio Messina, Alessio Miaschi, Gabriele Sarti, and Malvina Nissim. 2021. Teaching NLP with Bracelets and Restaurant Menus: An Interactive Workshop for Italian Students. In Proceedings of the Fifth Workshop on Teaching NLP, Online. Association for Computational Linguistics.
Barbara F. Reskin. 2005. Including mechanisms in our models of ascriptive inequality. Handbook of employment discrimination research, pages 75–97.
Rachele Sprugnoli, Stefano Menini, Sara Tonelli, Filippo Oncini, and Enrico Piras. 2018. Creating a WhatsApp Dataset to Study Pre-teen Cyberbullying. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 51–59. Association for Computational Linguistics.
Derald Wing Sue. 2010. Microaggressions in Everyday Life: Race, Gender, and Sexual Orientation. John Wiley & Sons.
Notes de bas de page
1 http://di.unito.it/silencehateitaly
2 http://di.unito.it/iorispetto
3 http://di.unito.it/orientamentoscuole
4 http://hatespeech.di.unito.it/
6 Qualities beyond the control of an individual.
7 The definition of hate speech we referred to is the one codified by The Council of Europe: “the term ‘hate speech’ shall be understood as covering all forms of expression which spread, incite, promote or justify racial hatred, xenophobia, anti-Semitism or other forms of hatred based on intolerance” (Recommendation No. R (97) 20).
8 https://didattica.controlodio.it/
9 Translation: Buses for whites and others for migrants. There is no mention of South Africa’s apartheid or the period of segregation in the United States, but of proposal by Lega for the province of Bergamo. Italy is not a racist country but in 2020 this is what we are discussing. URL
10 Translation: Too bad that poor African families do not land, but only the Nigerian mafia, former Tunisian convicts, Senegalese rapists, ISIS terrorists from Libya, all heavy-weight criminals 1.80 tall, ready to sell drugs, rape our women, cannibals and murderers.
11 Translation: resources, dirty, godsend, disgust, invasion, peddle
12 Translation: closed harbors, send [them] away, defense of the fatherland
13 Didattica A Distanza.
14 For the most part of the school year 2020-2021, Italian schools allowed a capacity of 50% inside classrooms.
Auteurs
Università degli Studi di Torino, Italy - Universitat Politècnica de València, Spain – simona.frenda@unito.it
Università degli Studi di Torino, Italy - Universitat Politècnica de València, Spain – alessandrateresa.cignarella@unito.it
Università degli Studi di Torino, Italy – marcoantonio.stranisci@unito.it
Università degli Studi di Torino, Italy – mirko.lai@unito.it
Università degli Studi di Torino, Italy – cristina.bosco@unito.it
Università degli Studi di Torino, Italy – viviana.patti@unito.it
Le texte seul est utilisable sous licence Creative Commons - Attribution - Pas d'Utilisation Commerciale - Pas de Modification 4.0 International - CC BY-NC-ND 4.0. Les autres éléments (illustrations, fichiers annexes importés) sont « Tous droits réservés », sauf mention contraire.
Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015
3-4 December 2015, Trento
Cristina Bosco, Sara Tonelli et Fabio Massimo Zanzotto (dir.)
2015
Proceedings of the Third Italian Conference on Computational Linguistics CLiC-it 2016
5-6 December 2016, Napoli
Anna Corazza, Simonetta Montemagni et Giovanni Semeraro (dir.)
2016
EVALITA. Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 7 December 2016, Naples
Pierpaolo Basile, Franco Cutugno, Malvina Nissim et al. (dir.)
2016
Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017
11-12 December 2017, Rome
Roberto Basili, Malvina Nissim et Giorgio Satta (dir.)
2017
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
10-12 December 2018, Torino
Elena Cabrio, Alessandro Mazzei et Fabio Tamburini (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 12-13 December 2018, Naples
Tommaso Caselli, Nicole Novielli, Viviana Patti et al. (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian Final Workshop
Valerio Basile, Danilo Croce, Maria Maro et al. (dir.)
2020
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
Bologna, Italy, March 1-3, 2021
Felice Dell'Orletta, Johanna Monti et Fabio Tamburini (dir.)
2020
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
Milan, Italy, 26-28 January, 2022
Elisabetta Fersini, Marco Passarotti et Viviana Patti (dir.)
2022