Ensemble of LSTMs for EVALITA 2018 Aspect-based Sentiment Analysis task (ABSITA)
p. 114-117
Résumés
In identifying the different emotions present in a review, it is necessary to distinguish the single entities present and the specific semantic relations. The number of reviews needed to have a complete dataset for every single possible option is not predictable.
The approach described starts from the possibility to study the aspect and later the polarity and to create an ensemble of the two models to provide a better understanding of the dataset.
Nell’identificazione delle diverse emozioni presenti in una recensione è necessario distinguere le singole entità presenti e le singole relazioni semantiche. Il numero di recensioni necessarie per avere un dataset completo per ogni singola opzione possibile non è predicibile.
L’approccio descritto parte dalla possibilità di creare due modelli diversi, uno per la parte di categorizzazione, e l’altro per la parte di polarità. E di unire i due modelli per ottenere una maggiore comprensione del dataset.
Texte intégral
1 Introduction
1With the increase in interactions between users and businesses across different channels and different languages, it becomes increasingly difficult for businesses to respond promptly and effectively in an effective manner. Not all activities can have a team dedicated to public relations and often rely on external agencies that do not know the internal operations of the company.
2Automating the correct recognition of the various problems can lead to the timely addressing of the same to the persons appointed to solve them.
3The research was carried out with the dataset provided within the task called ABSITA, Aspect-based Sentiment Analysis at EVALITA 20181 (Basile et al., 2018). The task was a combination of two tasks, Aspect Category Detection (ACD) and Aspect Category Polarity (ACP).
4The dataset is a selection of hotel reviews taken in Italian from the portal Booking.com.
2 Description of the system
5Each review has been cleaned up by special characters, lemmatized and brought to lowercase with the SpaCy2 framework.
Generic Italian texts have been used, instead of reviews in the accommodation context to be sure that the model will be suitable for more business models, to generate vectors in fastText3. The best one has a dimension of 200, with character n-grams of length 5, a window of size 5 and 10 negatives.
6The system is the ensemble of two different models to improve the ability to discover hidden properties (Akhtar et al., 2018).
7The first model is a bi-directional Long Short-Term Memory (BI-LSTM).
This model is used for the discernment of the ASPECT.
Layer (type) | Output Shape | Param # |
e (Embedding) | (None, 100, 200) | 1420400 |
b (Bidirection) | (None, 512) | 935936 |
d (Dense) | (None, 7) | 3591 |
8A second BI-LSTM model is used for the discernment of POLARITY.
Layer (type) | Output Shape | Param # |
e (Embedding) | (None, 100, 200) | 1420400 |
b (Bidirection) | (None, 512) | 935936 |
d (Dense) | (None, 14) | 7182 |
9A dropout and a recurrent_dropout of 0.1.
The optimizer for both is the RMSProp.
The loaded embedding is trainable.
Both the systems use Keras4 to create the RNN models.
10The models were trained and tested with a 5-fold cross-validation with a ratio of 80% training and 20% testing. The best model was automatically saved at each iteration.
11A threshold of 0.5 was used on the first model to activate the result of the last layer. In the second model, the threshold was of 0.43.
Table 1: micro precision, micro recall and micro F1 score with the gold dataset
Aspect Category Detection (ACD) | ||
micro precision | micro recall | micro F1 score |
0.8397 | 0.8050 | 0.8204 |
Table 2: micro precision, micro recall and micro F1 score with the gold dataset
Aspect Category Polarity (ACP) | ||
micro precision | micro recall | micro F1 score |
0.8138 | 0.6593 | 0.7172 |
12The results show that the models are useful to understand the category of a review better than its polarity.
13After that we ensemble the two models (Choi et al., 2018) to obtain a system able to overcome the results of every single model in the ACP task reducing the result on the ACD task (table 3).
14The ensemble has been created in cascade making sure that a system acts as Attention to the underlying system.
15The threshold of activation was a range between 0.45 and 0.55.
16A third model, a LightGBM5 (Bennici and Portocarrero, 2018) was also tested, where the following properties are extracted from the reviews text:
length of the review
percentage of special characters
the number of exclamation points
the number of question marks
the number of words
the number of characters
the number of spaces
the number of stop words
the ratio between words and stop words
the ratio between words and spaces
and they are joined to the vector created by the bigram and trigram of the text itself at word and character level.
17The number of leaves is 250, the learner set as ‘Feature’, and a the learning rate at 0.04.
18The result of the union between the three models could not be submitted to the final evaluation, due to the limit of 2 possible submissions, but reported results higher than 83% in the tests carried out after the release of the complete dataset for ASPECT and 75% for POLARITY.
19Also, the inference is faster than the RNN models.
3 Results
Table 3: micro precision, micro recall and micro F1 score for the submitted ACD subtasks
Aspect Category Detection (ACD) | |||
Runs | micro precision | micro recall | micro F1 |
Run 1 | 0.8713 | 0.7504 | 0.8063 |
Run 2 | 0.8697 | 0.7481 | 0.8043 |
Table 4: micro precision, micro recall and micro F1 score for the submitted ACP subtask
Aspect Category Polarity (ACP) | |||
Runs | micro precision | micro recall | micro F1 |
Run 1 | 0.7387 | 0.7206 | 0.7295 |
Run 2 | 0.7472 | 0.7186 | 0.7326 |
20In the evaluation phase, we can see how the results have given reason to the ensemble of the two results.
21It is clear that the ACP task (table 4) is the beneficiary of this process, instead of the ACD one (table 3) that lost more than one point.
22The study of the dataset is influenced by the little extension of the training dataset and by the specificity of some terms that could refer to different categories such as the comfort of the room and the quality/price ratio.
23Various types of data preparation have also been used, including the preservation of special characters, the shape of words (to better identify cities or places written in capital letters), and some SMOTE functions to increase the number of entries but with poor results and noticeable overfitting.
4 Conclusion
24Creating an ensemble of models to bring out various properties of a review gave better results than using a single model in the polarity identification.
25The terms used in the review are sometimes misleading and can be used both positively or negatively, and to identify different categories of the hotel.
26In the near future, we are ready to create a system to split the text of the review to categorize only a single sentence, or less a single subject or object. In this way, we will be ready to evaluate also the polarity of the single object or subject, and only the terms single related to it to improve the result of the ACP task.
27The performance of the system will also be evaluated by replacing all the possible entities with variables known as:
City
Museum
Panoramic Point
Railway station
Street
and with a pre-category knew a priori as Breakfast for words like Coffee, Cornetto, and Jam.
28The expected result is to reduce the variance of the dataset, to improve the ACD result, and to be able to use the system in production.
29Finally, we will evaluate the speed and effectiveness of a CNN model in which the tasks, ASPECT, and POLARITY, can be studied separately and then merged.
Bibliographie
Des DOI sont automatiquement ajoutés aux références bibliographiques par Bilbo, l’outil d’annotation bibliographique d’OpenEdition. Ces références bibliographiques peuvent être téléchargées dans les formats APA, Chicago et MLA.
Format
- APA
- Chicago
- MLA
Basile, P., Basile, V., Croce, D., & Polignano, M. (2018). Overview of the EVALITA 2018 Aspect-based Sentiment Analysis task (ABSITA). Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA’18)
Akhtar, M., Ghosal, D., Ekbal, A., Bhattacharyya, P., & Kurohashi, S. (2018, October 15). A Multi-task Ensemble Framework for Emotion, Sentiment and Intensity Prediction. Retrieved from https://arxiv.org/abs/1808.01216
Choi, J. Y. and Bumshik, L. (2018).“Combining LSTM Network Ensemble via Adaptive Weighting for Improved Time Series Forecasting,” Mathematical Problems in Engineering, vol. 2018, Article ID 2470171, 8 pages. doi: https://0-doi-org.catalogue.libraries.london.ac.uk/10.1155/2018/2470171.
10.1155/2018/2470171 :Bennici, M. and Seijas Portocarrero, X. (2018). The validity of dictionaries over the time in Emoji prediction. In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors, Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA’18), Turin, Italy. CEUR.org.
Notes de bas de page
Auteurs
You Are My GUide – mauro[at]youaremyguide.com
You Are My GUide – xileny[at]youaremyguide.com
Le texte seul est utilisable sous licence Creative Commons - Attribution - Pas d'Utilisation Commerciale - Pas de Modification 4.0 International - CC BY-NC-ND 4.0. Les autres éléments (illustrations, fichiers annexes importés) sont « Tous droits réservés », sauf mention contraire.
Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015
3-4 December 2015, Trento
Cristina Bosco, Sara Tonelli et Fabio Massimo Zanzotto (dir.)
2015
Proceedings of the Third Italian Conference on Computational Linguistics CLiC-it 2016
5-6 December 2016, Napoli
Anna Corazza, Simonetta Montemagni et Giovanni Semeraro (dir.)
2016
EVALITA. Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 7 December 2016, Naples
Pierpaolo Basile, Franco Cutugno, Malvina Nissim et al. (dir.)
2016
Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017
11-12 December 2017, Rome
Roberto Basili, Malvina Nissim et Giorgio Satta (dir.)
2017
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
10-12 December 2018, Torino
Elena Cabrio, Alessandro Mazzei et Fabio Tamburini (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 12-13 December 2018, Naples
Tommaso Caselli, Nicole Novielli, Viviana Patti et al. (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian Final Workshop
Valerio Basile, Danilo Croce, Maria Maro et al. (dir.)
2020
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
Bologna, Italy, March 1-3, 2021
Felice Dell'Orletta, Johanna Monti et Fabio Tamburini (dir.)
2020
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
Milan, Italy, 26-28 January, 2022
Elisabetta Fersini, Marco Passarotti et Viviana Patti (dir.)
2022