KERMIT for Sentiment Analysis in Italian Healthcare Reviews
p. 411-416
Résumés
In this paper, we describe our approach to the sentiment classification challenge on Italian reviews in the healthcare domain. Firstly, we followed the work of Bacco et al. from which we obtained the dataset. Then, we generated our model called KERMITHC based on KERMIT (Zanzotto et al. 2020). Through an extensive comparative analysis of the results obtained, we showed how the use of syntax can improve performance in terms of both accuracy and F1-score compared to previously proposed models. Finally, we explored the interpretative power of KERMIT-viz to explain the inferences made by neural networks on examples.
In questo lavoro, presentiamo il nostro approccio al task di sentiment analysis per le recensioni italiane in ambito sanitario. Abbiamo seguito il lavoro di Bacco et al. da cui abbiamo ottenuto il dataset. Successivamente, abbiamo usato KERMITHC basato su KERMIT(Zanzotto et al. 2020). Da un’ampia analisi comparativa dei risultati ottenuti mostriamo come l’uso della sintassi può migliorare le prestazioni sia in termini di accuratezza che di F1-score rispetto ai modelli proposti in precedenza. Infine, abbiamo esplorato il potere interpretativo di KERMIT-viz per spiegare le inferenze fatte dalle reti neurali sugli esempi.
Texte intégral
1. Introduction
1People are practically reviewing anything in on-line sites and understanding the polarization of a comment through automatic sentiment classifier is a tantalizing challenge. In recent years, the number of virtual reviewers has drastically increased and there are many products and services, which can be reviewed. Each person, before buying a product or a service, searches into reviews from people who have already had experienced the product or the service. Review portals are usually linked to the leisure or business activities such as the world of tourism, e-commerce or movies. However, there are topics where these reviews and the associated automatic computed sentiment may induce to select wrong services, which may dramatically affect personal life.
2When dealing with health-related services, the effect of positive or negative reviews on hospitals and doctors can have a potential catastrophic impact on the health of who is using this piece of information. QSalute1 is one of the most important Italian portals of reviews about hospitals, nursing homes and doctors. It is very important for patients to seek the best hospital for their condition based on the past experience of other patients. Reviews in the world of health benefit both patients and hospitals because they are a means to discover problems and solve them (Greaves et al. 2013; Khanbhai et al. 2021).
3Automatic sentiment analyzer have then a big responsibility in the context of health-related services. In these sensitive areas, it is important to design AI systems whose decisions are transparent (Doshi-Velez and Kim 2017), that is, the systems must give the motivation for the choice made so that people can trust. If the users do not trust a model or a prediction, they will not use it (Ribeiro, Singh, and Guestrin 2016).
4In this article, we investigate a model that can mitigate the responsibility of sentiment analyzers for health-related services. The model we are using exploits syntactic information within neural networks to provide a clear visualisation of the internal decision mechanism of the model that produced the decision. We propose KERMITHC (KERMIT for HealthCare) based on KERMIT (Zanzotto et al. 2020) to solve the sentiment analysis task introduced by Bacco el al.. We use KERMITHC on QSalute Italian portal reviews in order to include symbolic knowledge as a part of the architecture and visualize the internal decision-making mechanism of the neural model, using KERMIT-viz (Ranaldi, Fallucchi, and Zanzotto 2021).
5In the rest of paper, Section 2 gives details about the dataset and methods, while Section 3 and 4 describe the experiments, the results obtained and their discussion. Finally, in Section 5 we present the final conclusions and future goals.
2. Data & Methods
6To explore our hunch that syntactic interpretation may help in Healthcare reviews recognition, we leverage: (1) a Healthcare training corpus (Sec. 2.1); (2) a KERMITHC, which is based on syntactic interpretation and it can explain its decisions; and finally, (3) some challenges solved due to KERMITHC (Sec. 2.2).
2.1 Dataset
7In order to investigate reviews in healthcare area, we selected the QSalute portal, one of the most important health websites in Italy. This portal can be defined as the TripAdvisor of hospital facilities, indeed it talks about: Expertise, Assistance, Cleaning and Services. In addition to the reviews, there are some associated metadata such as: user id, hospital name, review title and patient pathology. To ensure privacy we do not consider sensitive data such as user id and hospital name.
8We used a free available scraper on GitHub2 to download the dataset. Then, to model this data to a sentiment analysis task, we followed the indications provided by Bacco et al. - in detail, a review is: (1) negative if the average of its scores is less than or equal to 2, (2) positive if the average of its scores is greater than or equal to 4 (3) neutral otherwise.
9The resulting dataset is composed of 47,224 reviews consisting of: 40,641 reviews in the positive class, 3,898 in the neutral class and 2,685 in the negative class.
10In this work, we solely consider positive and negative classes, so our final dataset is composed of 43,326 reviews. The dataset is heavily skewed (93,80% positive class - 6,20% negative class) favoring reviews labeled as positive.
2.2 KERMIT 4 Healthcare
11KERMITHC (KERMIT for HealthCare) architecture is composed of 3 major parts: (1) a KERMIT model described in Zanzotto et al. , (2) a Transformers model and (3) a decoder layer that combines the results obtained from the previous two sub-parts. In figure Fig.1 we show a graphical representation of the architecture of KERMITHC, pointing the parts that compose it.
12The architecture of KERMITHC makes it a particular model, because it combines the syntax offered by KERMIT with the versatility of a Transformer-model. We use KERMIT because it allows the encoding of universal syntactic interpretations in a neural network architecture. KERMIT component is itself composed of two parts: KERMIT encoder, which converts parse tree T into embedding vectors and a multi-layer perceptron that exploits these embedding vectors.
13The second sub-part of our architecture is composed of a Bidirectional Encoder Representations from Transformers, - as known as BERT - to classify the sentiment of the reviews. BERT is a pre-trained language model developed by Devlin et al. at Google AI Language. In particular, since the task concerns sentences in the Italian language, we have used a special BERT version pretrained on that language called AlBERTo (Polignano et al. 2019).
3. Experiments
14We used KERMITHC architecture to examine if it is possible to answer the research questions showed in KERMIT (Zanzotto et al. 2020) also in healthcare domain using the Italian language. Those research questions are: (1) Can the symbolic knowledge provided by universal symbolic syntactic interpretations, make a difference and it be used effectively in neural networks? (2) Do universal symbolic syntactic interpretations encode different syntactic information than those encoded in “embeddings of universal sentences"? (3) Can the universal symbolic syntactic interpretations provided by KERMITHC, supply a better and clearer way to explain the decisions of neural networks than those provided by transformers?
15To provide a comprehensive answer to these questions, we tested the architecture in a completely universal setting where both KERMIT and AlBERTo are trained only in the last decision layer.
16The rest of the Section describes the experimental set-up, the quantitative experimental results and discusses how we can use the KERMIT-viz to explain decisions of neural network inferences over examples.
3.1 Experimental Set-up
17This section describes the general experimental set-up of our experiments and the specific configurations adopted.
18The parameters used for the KERMIT encoder are those proposed in Zanzotto et al., paper. The constituency parse trees used for KERMIT sub-part are obtained using our freely available script on GitHub3.
19We tested several different BERT version pre-trained on Italian language in order to get the best model for our task. In particular, we tested the following transformers: (1) UmBERTo (Parisi, Francia, and Magnani 2020); (2) AlBERTo (Polignano et al. 2019); (3) BERT multilingual (Devlin et al. 2018) and (4) ELECTRAita: an Italian version of ELECTRA model (Clark et al. 2020) implemented by Schweter on a work of Chan et al. . All the models were implemented using Huggingface’s transformers library (Wolf et al. 2019) and all were used in the uncased setting with the pre-trained version. The input text for BERT has been preprocessed and tokenized as specified in respectively work (Parisi, Francia, and Magnani 2020; Polignano et al. 2019; Devlin et al. 2018; Schweter 2020).
20Since our experiments are text classification task, the decoder layer of our KERMITHC architecture is a fully connected layer with the softmax activation function applied to the concatenation of the KERMIT sub-part output and the final [CLS] token representation of the selected transformer model. Finally, the optimizer used to train the whole architecture is AdamW (Loshchilov and Hutter 2019) with the learning rate set to 2e-5. For reproducibility, the source code of our experiments is publicly available on our GitHub repository4.
4. Results and Discussion
21Syntactic information is useful to significantly increase performances to classify Healthcare reviews (see Table 2). KERMITHC uses AlBERTo which is the best BERT-italian version model according to our experiments, showed in Table 1. Especially KERMITHC outperforms the solely AlBERTo sub-part model (ref. to Table 2).
Table 1: Performance of BERT, on 25% of the QSalute dataset. Mean and standard deviation results are obtained from 10 runs. For each Site, the best performing model was highlighted based on the F1 score values obtained. The symbols ⋄, ◦ and † indicate a statistically significant difference between two results with a 95% of confidence level with the sign test.
Model | Average Accuracy | Average Macro F1 score | AverageWeighed F1 score |
UmBERTo | 0.74(±0.14)⋄ | 0.43(±0.02) | 0.75(±0.18)° |
AlBERTo | 0.82(±0.15)⋄ | 0.47(±0.05)† | 0.8(±0.14)° |
BERT multilingual | 0.73(±0.13) | 0.46(±0.1)† | 0.73(±0.22) |
ELECTRAita | 0.67(±0.17) | 0.4(±0.13) | 0.66(±0.2) |
22As in the work proposed by Bacco et al., (2020) we chose to divide the dataset by “Site” and evaluate the models using accuracy and F1-score metrics. Despite this division, the dataset is still very unbalanced favoring the class 1 (positive reviews). We reports results in terms of the accuracy, Macro F1 and Weighed F1. Observing Table 2, we can see that the performance obtained by KERMITHC always exceeds the best configuration of BERT: AlBERTo. Hence, trained on the Healthcare review dataset (Bacco et al. 2020) (see Section 2.1) KERMITHC seems to be a good candidate to analyze sentiment of hospital patients.
Table 2: Performance of KERMITHC and AlBERTo on QSalute database grouped by Site. Mean and standard deviation results are obtained from 10 runs. For each Site, the best performing model was highlighted based on the F1 score values obtained. The symbol † indicate a statistically significant difference between two results with a 95% of confidence level with the sign test.
Site | Model | Average Accuracy | Average Macro F1 score | Average Weighed F1 score |
Pneumology | KERMITHC | 0.71 (± 0.14) | 0.51 (± 0.08) | 0.7 (± 0.11) |
AlBERTo | 0.66 (± 0.27) | 0.4 (± 0.12)† | 0.61 (± 0.26) | |
Thoracic Surgery | KERMITHC | 0.78 (± 0.13) | 0.51 (± 0.07) | 0.81 (± 0.08) |
AlBERTo | 0.74 (± 0.28) | 0.43 (± 0.13) | 0.74 (± 0.26) | |
Nervous System | KERMITHC | 0.87 (± 0.05)† | 0.6 (± 0.03)† | 0.89 (± 0.03) |
AlBERTo | 0.94 (± 0.01)† | 0.48 (± 0.0)† | 0.91 (± 0.01) | |
Hearth | KERMITHC | 0.93 (± 0.03)† | 0.56 (± 0.03)† | 0.93 (± 0.02) |
AlBERTo | 0.96 (± 0.01)† | 0.49 (± 0.0)† | 0.94 (± 0.01) | |
Vascular Surgery | KERMITHC | 0.81 (± 0.16) | 0.49 (± 0.06)† | 0.83 (± 0.12) |
AlBERTo | 0.70 (± 0.29) | 0.42 (± 0.11)† | 0.73 (± 0.23) | |
Ophthalmology | KERMITHC | 0.79 (± 0.08) | 0.55 (± 0.05)† | 0.83 (± 0.06) |
AlBERTo | 0.87 (± 0.08) | 0.48 (± 0.02)† | 0.86 (± 0.04) | |
Rheumatology | KERMITHC | 0.58 (± 0.23) | 0.43 (± 0.11) | 0.60 (± 0.20) |
AlBERTo | 0.68 (± 0.20) | 0.44 (± 0.10) | 0.69 (± 0.19) | |
Infections | KERMITHC | 0.68 (± 0.19) | 0.51 (± 0.12) | 0.70 (± 0.17) |
AlBERTo | 0.57 (± 0.23) | 0.42 (± 0.13) | 0.58 (± 0.21) | |
Skin | KERMITHC | 0.64 (± 0.11) | 0.50 (± 0.07) | 0.70 (± 0.10) |
AlBERTo | 0.63 (± 0.26) | 0.39 (± 0.11) | 0.61 (± 0.24) | |
Genital | KERMITHC | 0.79 (± 0.09)† | 0.55 (± 0.03)† | 0.82 (± 0.06) |
AlBERTo | 0.88 (± 0.06)† | 0.49 (± 0.02)† | 0.87 (± 0.03) | |
Endoscopy | KERMITHC | 0.75 (± 0.09) | 0.52 (± 0.04)† | 0.80 (± 0.05) |
AlBERTo | 0.80 (± 0.19) | 0.45 (± 0.07)† | 0.78 (± 0.17) | |
Facial | KERMITHC | 0.70 (± 0.24) | 0.42 (± 0.08) | 0.76 (± 0.18) |
AlBERTo | 0.72 (± 0.26) | 0.42 (± 0.10) | 0.76 (± 0.22) | |
Oncology | KERMITHC | 0.91 (± 0.06) | 0.52 (± 0.04)† | 0.92 (± 0.03) |
AlBERTo | 0.89 (± 0.21) | 0.46 (± 0.08)† | 0.89 (± 0.17) | |
Haematology | KERMITHC | 0.56 (± 0.30) | 0.36 (± 0.14) | 0.57 (± 0.31) |
AlBERTo | 0.41 (± 0.25) | 0.30 (± 0.11) | 0.46 (± 0.23) | |
Endocrinology | KERMITHC | 0.71 (± 0.20) | 0.48 (± 0.12) | 0.71 (± 0.22) |
AlBERTo | 0.73 (± 0.29) | 0.41 (± 0.13) | 0.69 (± 0.28) | |
Gynaecology | KERMITHC | 0.82 (± 0.08) | 0.56 (± 0.05)† | 0.85 (± 0.05) |
AlBERTo | 0.85 (± 0.14) | 0.48 (± 0.04)† | 0.84 (± 0.09) | |
Otorhinology | KERMITHC | 0.84 (± 0.14) | 0.50 (± 0.06) | 0.86 (± 0.09) |
AlBERTo | 0.80 (± 0.18) | 0.46 (± 0.05) | 0.83 (± 0.13) |
23Using the KERMIT-viz visualiser, we analysed how important the contribution of symbolic knowledge provided by KERMIT can be. In many cases it makes all the difference. Looking at the Figure 2, these are two sentences with a positive target. The first sentence (shown in Fig. 2a) is clearly positive while the sentence shown in the Fig. 2b could be ambiguous as the patient makes bad remarks about the service but praises the head of the department. We can observe how some words have been colored in red (therefore they have received a greater weight during the classification phase) emphasizing the positive aspects of the sentence and causing it to be labeled as “positive review”. In this way the explainability is guaranteed and in very delicate topics - like sentiment in health reviews - we can have more “trust” on sentiment analysers.
5. Conclusion
24In this article, we investigated a model that can mitigate the responsibility of sentiment analyzers for health-related services. Our model KERMITHC exploits syntactic information within neural networks to provide a clear visualisation of its internal decision mechanism. KERMITHC is based on KERMIT (Zanzotto et al. 2020) and we worked in a sentiment analysis task introduced by Bacco el al..
25We studied several versions of pre-trained BERT models on the Italian language and found out that AlBERTo is, among them, the best model for this task. However, KERMITHC, which is composed of KERMIT+AlBERTo, outperforms better than AlBERTo model alone. Additionally, via KERMIT-viz, we visualized the reasons why KERMITHC classifies the dataset. We observed how KERMITHC captures relevant syntactic information by catching the keywords in each sentence giving them more weight in the decision phase, mitigating and capturing possible errors of the sentiment analysers. Our future goal is to be able to have full control of the sentiment analysers by injecting human rules (Onorati et al. 2020) in order to mitigate possible errors.
Bibliographie
Des DOI sont automatiquement ajoutés aux références bibliographiques par Bilbo, l’outil d’annotation bibliographique d’OpenEdition. Ces références bibliographiques peuvent être téléchargées dans les formats APA, Chicago et MLA.
Format
- APA
- Chicago
- MLA
Luca Bacco, A. Cimino, L. Paulon, M. Merone, and F. Dell’Orletta. 2020. “A Machine Learning Approach for Sentiment Analysis for Italian Reviews in Healthcare.” In CLiC-It.
Branden Chan, Stefan Schweter, and Timo Möller. 2020. German’s next language model. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6788–6796, Barcelona, Spain (Online), December. International Committee on Computational Linguistics.
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. “ELECTRA: Pre-Training Text Encoders as Discriminators Rather Than Generators.” In ICLR. https://openreview.net/pdf?id=r1xMH1BtvB.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” CoRR abs/1810.04805. http://arxiv.org/abs/1810.04805.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding.
Finale Doshi-Velez, and Been Kim. 2017. “Towards a Rigorous Science of Interpretable Machine Learning.” http://arxiv.org/abs/1702.08608.
Felix Greaves, Daniel Ramirez-Cano, Christopher Millett, Ara Darzi, and Liam Donaldson. 2013. “Use of Sentiment Analysis for Capturing Patient Experience from Free-Text Comments Posted Online.” Journal of Medical Internet Research 15 (November): e239. https://0-doi-org.catalogue.libraries.london.ac.uk/10.2196/jmir.2721.
10.2196/jmir.2721 :Mustafa Khanbhai, Patrick Anyadi, Joshua Symons, Kelsey Flott, Ara Darzi, and Erik Mayer. 2021. “Applying Natural Language Processing and Machine Learning Techniques to Patient Experience Feedback: A Systematic Review.” BMJ Health & Care Informatics 28 (1). https://0-doi-org.catalogue.libraries.london.ac.uk/10.1136/bmjhci-2020-100262.
10.1136/bmjhci-2020-100262 :Ilya Loshchilov, and Frank Hutter. 2019. “Decoupled weight decay regularization.” 7th International Conference on Learning Representations, ICLR 2019.
Dario Onorati, Pierfrancesco Tommasino, Leonardo Ranaldi, Francesca Fallucchi, and Fabio Massimo Zanzotto. 2020. “Pat-in-the-Loop: Declarative Knowledge for Controlling Neural Networks.” Future Internet 12 (12). https://0-doi-org.catalogue.libraries.london.ac.uk/10.3390/fi12120218.
10.3390/fi12120218 :Loreto Parisi, Simone Francia, and Paolo Magnani. 2020. “UmBERTo: An Italian Language Model Trained with Whole Word Masking.”
Marco Polignano, Pierpaolo Basile, Marco de Gemmis, Giovanni Semeraro, and Valerio Basile. 2019. “AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets.” In Proceedings of the Sixth Italian Conference on Computational Linguistics (Clic-It 2019). Vol. 2481. CEUR. https://0-www-scopus-com.catalogue.libraries.london.ac.uk/inward/record.uri?eid=2-s2.0-85074851349&partnerID=40&md5=7abed946e06f76b3825ae5e294ffac14.
Leonardo Ranaldi, Francesca Fallucchi, and Fabio Massimo Zanzotto. 2021. “KERMITviz: Visualizing Neural Network Activations on Syntactic Trees.” In In the 15th International Conference on Metadata and Semantics Research (Mtsr’21). Vol. 1.
10.1007/978-3-030-98876-0 :Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “"Why Should I Trust You?": Explaining the Predictions of Any Classifier.” http://arxiv.org/abs/1602.04938.
Stefan Schweter. 2020. Italian Bert and Electra Models (version 1.0.1). Zenodo. https://0-doi-org.catalogue.libraries.london.ac.uk/10.5281/zenodo.4263142.
10.5281/zenodo.4263142 :Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, et al. 2019. “HuggingFace’s Transformers: State-of-the-art Natural Language Processing.” ArXiv abs/1910.0.
Fabio Massimo Zanzotto, Andrea Santilli, Leonardo Ranaldi, Dario Onorati, Pierfrancesco Tommasino, and Francesca Fallucchi. 2020. “KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations.” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (Emnlp), 256–67. Online: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.emnlp-main.18.
Notes de bas de page
2 The scraper is available at https://github.com/lbacco/Italian-Healthcare-Reviews-4-Sentiment-Analysis
3 The code is available at https://github.com/LeonardRanaldi/Constituency-Parser-Italian
4 The code is available at https://github.com/ART-Group-it/KERMIT-4-Sentiment-Analysis-on-Italian-Reviews-in-Healthcare
Auteurs
Dept. of Innovation and Information Engineering Guglielmo Marconi University, Italy – l.ranaldi@unimarconi.com
Dept. of Enterprise Engineering University of Rome Tor Vergata, Italy – michele.mastromattei@uniroma2.it
Dept. of Enterprise Engineering University of Rome Tor Vergata, Italy – dario.onorati@uniroma1.it
Dept. of Enterprise Engineering University of Rome Tor Vergata, Italy – elenasofia.ruzzetti@alumni.uniroma2.eu
Dept. of Innovation and Information Engineering Guglielmo Marconi University, Italy – f.fallucchi@unimarconi.it
Dept. of Enterprise Engineering University of Rome Tor Vergata, Italy – fabio.massimo.zanzotto@uniroma2.it
Le texte seul est utilisable sous licence Creative Commons - Attribution - Pas d'Utilisation Commerciale - Pas de Modification 4.0 International - CC BY-NC-ND 4.0. Les autres éléments (illustrations, fichiers annexes importés) sont « Tous droits réservés », sauf mention contraire.
Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015
3-4 December 2015, Trento
Cristina Bosco, Sara Tonelli et Fabio Massimo Zanzotto (dir.)
2015
Proceedings of the Third Italian Conference on Computational Linguistics CLiC-it 2016
5-6 December 2016, Napoli
Anna Corazza, Simonetta Montemagni et Giovanni Semeraro (dir.)
2016
EVALITA. Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 7 December 2016, Naples
Pierpaolo Basile, Franco Cutugno, Malvina Nissim et al. (dir.)
2016
Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017
11-12 December 2017, Rome
Roberto Basili, Malvina Nissim et Giorgio Satta (dir.)
2017
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
10-12 December 2018, Torino
Elena Cabrio, Alessandro Mazzei et Fabio Tamburini (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 12-13 December 2018, Naples
Tommaso Caselli, Nicole Novielli, Viviana Patti et al. (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian Final Workshop
Valerio Basile, Danilo Croce, Maria Maro et al. (dir.)
2020
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
Bologna, Italy, March 1-3, 2021
Felice Dell'Orletta, Johanna Monti et Fabio Tamburini (dir.)
2020
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
Milan, Italy, 26-28 January, 2022
Elisabetta Fersini, Marco Passarotti et Viviana Patti (dir.)
2022