UNIMIB @ DIACR-Ita: Aligning Distributional Embeddings with a Compass for Semantic Change Detection in the Italian Language
p. 451-455
Résumé
In this paper, we present our results related to the EVALITA 2020 challenge, DIACR-Ita, for semantic change detection for the Italian language. Our approach is based on measuring the semantic distance across time-specific word vectors generated with Compass-aligned Distributional Embeddings (CADE). We first generate temporal embeddings with CADE, a strategy to align word embeddings that are specific for each time period; the quality of this alignment is the main asset of our proposal. We then measure the semantic shift of each word, combining two different semantic shift measures. Eventually, we classify a word meaning as changed or not changed by defining a threshold over the semantic distance across time.
Texte intégral
Bibliographie
Pierpaolo Basile, Annalina Caputo, Tommaso Caselli, Pierluigi Cassotti, and Rossella Varvara. 2020. “DIACR-Ita @ EVALITA2020: Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task.” In Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2020), edited by Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro. Online: CEUR.org.
Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro. 2020. “EVALITA 2020: Overview of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian.” In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (Evalita 2020), edited by Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro. Online: CEUR.org.
Federico Bianchi, Valerio Di Carlo, Paolo Nicoli, and Matteo Palmonari. 2020. “Compass-Aligned Distributional Embeddings for Studying Semantic Differences Across Corpora.” arXiv Preprint arXiv:2004.06519. https://arxiv.org/abs/2004.06519.
William L. Hamilton, Jure Leskovec, and Dan Jurafsky. 2016a. “Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change.” In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1489–1501. Berlin, Germany: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/P16-1141.
William L. Hamilton, Jure Leskovec, and Dan Jurafsky. 2016b. “Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change.” In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2116–21. Austin, Texas: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/D16-1229.
Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, and Erik Velldal. 2018. “Diachronic Word Embeddings and Semantic Shifts: A Survey.” In Proceedings of the 27th International Conference on Computational Linguistics, 1384–97. Santa Fe, New Mexico, USA: Association for Computational Linguistics. https://www.aclweb.org/anthology/C18-1117.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems, 3111–9.
Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky, and Nina Tahmasebi. 2020. “SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection.” arXiv Preprint arXiv:2007.11464.
Nina Tahmasebi, Lars Borin, and Adam Jatowt. 2018. “Survey of Computational Approaches to Lexical Semantic Change.” arXiv Preprint arXiv:1811.06278.
Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, and Hui Xiong. 2018. “Dynamic Word Embeddings for Evolving Semantic Discovery.” In Proceedings of the Eleventh Acm International Conference on Web Search and Data Mining, 673–81.
Annexe
A CADE Misclassifications
We report in Tables 2 and 3 CADE’s misclassifications with the two best metrics, namely CADE (move) with λ = 0.3 and CADE (ln). Eventually, we also show in Tables 4 and 5 some examples of neighborhood for the target words.
Table 2: Wrong predictions done by CADE (move) with λ = 0.3.
Word | Pred | True |
trasferibile | changed | not changed |
pacchetto | changed | not changed |
piovra | changed | not changed |
Table 3: Wrong predictions done by CADE (ln)
Word | Pred | True |
pacchetto | changed | not changed |
rampante | not changed | changed |
Table 4 shows the top 10 nearest neighbors of the target word “pacchetto" and we think CADE classifies its meaning as changed because during time t1 the meaning is more focused in the economic area, as one can see from neighbors like “azionario", “obbligazione" or “contante" (translated to “stock" as referred to the market, “bond" and “cash" resp.); while at time t2 shifts to a more political sense, as shown by words such as “decreto" or “emendamento" (“decree" and “amendment" resp.).
Table 4: First 10 nearest neighbors by cosine similarity of the word “pacchetto" from t1 and t2
t1 | t2 |
azionario | maxiemendamento |
obbligazione | finanziaria |
azionista | decretone |
azionano | decreto |
edison | ddl |
casseforte | emendamento |
contante | liberalizzazioni |
siap | decretere |
shell | maxidecreto |
prestire | ecobonus |
The same it seems to happen for the target word “piovra", as one can see from Table 5, where at time t1 CADE gathers senses from both considering it as the animal, for example from the word “tentacle", or as someone tied to crime in general, given words such as “profittatore" or “ruberia" (“profiteer" and “robbery" resp.); while at time t2 captures a shift towards the Italian crime TV series “La piovra", as emerge from words such as “fiction", “camorra" or “retequattro", which is an Italian television channel.
Notes de bas de page
1 Check the belerico entry in the challenge leaderboard at https://competitions.codalab.org/competitions/20948#results
2 http://github.com/vinid/cade
3 When a neighbor is missing in one time slice, we replace it with the average vector of the space.
Auteurs
University of Milano-Bicocca Viale Sarca 336, 20126 Milan, Italy – f.belotti8@campus.unimib.it
Bocconi University Via Sarfatti 25, 20136 Milan, Italy – f.bianchi@unibocconi.it
University of Milano-Bicocca Viale Sarca 336, 20126 Milan, Italy – matteo.palmonari@unimib.it
Le texte seul est utilisable sous licence Creative Commons - Attribution - Pas d'Utilisation Commerciale - Pas de Modification 4.0 International - CC BY-NC-ND 4.0. Les autres éléments (illustrations, fichiers annexes importés) sont « Tous droits réservés », sauf mention contraire.
Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015
3-4 December 2015, Trento
Cristina Bosco, Sara Tonelli et Fabio Massimo Zanzotto (dir.)
2015
Proceedings of the Third Italian Conference on Computational Linguistics CLiC-it 2016
5-6 December 2016, Napoli
Anna Corazza, Simonetta Montemagni et Giovanni Semeraro (dir.)
2016
EVALITA. Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 7 December 2016, Naples
Pierpaolo Basile, Franco Cutugno, Malvina Nissim et al. (dir.)
2016
Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017
11-12 December 2017, Rome
Roberto Basili, Malvina Nissim et Giorgio Satta (dir.)
2017
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
10-12 December 2018, Torino
Elena Cabrio, Alessandro Mazzei et Fabio Tamburini (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 12-13 December 2018, Naples
Tommaso Caselli, Nicole Novielli, Viviana Patti et al. (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian Final Workshop
Valerio Basile, Danilo Croce, Maria Maro et al. (dir.)
2020
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
Bologna, Italy, March 1-3, 2021
Felice Dell'Orletta, Johanna Monti et Fabio Tamburini (dir.)
2020
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
Milan, Italy, 26-28 January, 2022
Elisabetta Fersini, Marco Passarotti et Viviana Patti (dir.)
2022