Exploiting Multiword Expressions to solve “La Ghigliottina”
p. 256-261
Résumés
The paper describes Unior4NLP a system developed to solve “La Ghigliottina” game which took part in the NLP4FUN task of the Evalita 2018 evaluation campaign. The system is the best performing one in the competition and achieves better results than human players.
Il contributo descrive il sistema UNIOR4NLP, sviluppato per risolvere il gioco “La Ghigliottina”, che ha partecipato alla sfida NLP4FUN della campagna di valutazione Evalita 2018. Il sistema risulta il migliore della competizione e ha prestazioni più elevate rispetto agli umani.
Texte intégral
1 Introduction
1In this paper we describe Unior4NLP, a system which took part in the NLP4FUN task of the Evalita 2018 evaluation campaign (Basile et al., 2018). The goal of this task is to design a solver for “La Ghigliottina”, the final game of the popular Italian TV quiz show “L’Eredità”. The game involves a single player, who is given a set of five words (clues), each one linked with an unknown sixth word that represents the solution to the game. For example, given the set of clues [ fighting, gun, roof, eater, set ] the solution is fire, because: the roof is on fire is a title of a famous song, while fire fighting, fire a gun, fire-eater, and set something on fire are fixed word constructions.
2Unior4NLP relies on the assumption that Multiword Expressions (MWEs) play an important role in solving the game: given a set of clues, the system outputs the solution word which forms the strongest connections with all of the clues.
3The paper is organized as follows: in Section 2 we present related work. In Section 3 we describe the different steps we took in order to prepare and tune the UNIOR4NLP system. In Section 4 we describe our system and its functioning, while results are presented in Section 5, where we also focus on error analysis concerning both the data-set of the NLP4FUN task and our system. Finally, conclusions and future work are presented in Section 6.
2 Related work
4From the very beginning of Artificial Intelligence (AI) games represented an interesting playground to test the results of research in this field (Yannakakis and Togelius, 2018). NLP plays an essential role in solving language related games and recent examples, such as the IBM Watson system in Jeopardy!TM (Ferrucci et al., 2013), have proven that its use can result in groundbreaking technology. An interesting test-bed for this type of approach is represented by language games, such as the Wheel of Fortune, Who Wants to be a Millionaire? and “La Ghigliottina”.
5The game “La Ghigliottina” is particularly challenging because its solution is based on modelling how words are connected to each other. A first artificial player of the game, OTTHO (Semeraro et al., 2009; Basile et al., 2016) exploits i) resources from the web such as Wikipedia to build a lexicon and a knowledge repository and ii) a knowledge base modeling represented by an association matrix which stores the degree of correlation between any two terms in the lexicon. Word correlations are detected by connecting i) lemmas to the terms in its dictionary definition, pair of words occurring in a proverb, movie or song title, and iii) pair of similar words by exploiting Vector Space Models (Salton et al., 1975).
6In our approach, we make use of similar resources but we only rely on a very limited set of syntactic constructions (patterns) to correlate words and build our association matrix.
3 Solving the Ghigliottina game
7Building an automatic solver for the Ghigliottina game requires a number of preliminary steps: i) the analysis of real game instances, ii) the analysis of patterns that could help the system in solving the game, iii) the collection of the linguistic resources necessary to tune the system for the task.
3.1 Analysis of real game instances
8We have analyzed a sample of 100 game instances that we personally collected from the last five editions of the TV show. We found out that in most cases each clue word is connected to the solution because they form a Multiword Expression (MWE). We have used this key observation in designing our system. We started working on our system before the announcement of the NLP4FUN task. Since our system is not supervised, the extra data-set is not adding any advantage to our system. After the official data-set was released, we found out that a good number of game instances was confirming our initial finding. However we also observed a number of unusual cases which will discuss in more depth in Section 5.2.
9A MWE can be defined as a sequence of words that presents some characteristic behaviour (at the lexical, syntactic, semantic, pragmatic or statistical level) and whose interpretation crosses the boundaries between words (Sag et al., 2002). MWEs have to be considered as lexical items which convey a single meaning different from the meanings of the constituents of the MWE, such as in the idiomatic expression kick the bucket where the simple addition of the meanings of kick and bucket does not convey the meaning of to die.
10We have different classes of MWEs, such as idioms (break a leg), verb particle constructions (to call off), light verbs constructions (to provoke a reaction). For a detailed overview of MWEs in NLP applications we refer the reader to Constant et al. (2017). For the purpose of the current task we considered only those MWEs characterized by fixed syntactic patterns described in the following section.
3.2 Pattern Analysis
11A first analysis of the tuples from the sample mentioned above revealed that words in the clues are typically nouns, verbs, or adjectives, while the ones in the solutions are typically nouns or adjectives (never verbs). A more detailed investigation resulted in the definition of six patterns that identify valid MWEs connecting clue and solution pairs. We list them below with some examples from our data-set (solution words are underlined):
A B: diario segreto (‘diary secret’ → secret diary), brutta caduta (‘ugly fall’ → bad fall), permesso premio (‘permit price’ → good behaviour license), dare gas (‘give gas’ → accelerate).
A det B: dare il permesso (‘give the permit’ → authorize).
A prep B: colpo di coda (‘flick of tail’ → last ditch effort).
A conj B: stima e affetto (esteem and affection).
A prepart B or A prep det B: virtù dei forti, part of the famous Italian proverb La calma è la virtù dei forti (patience is the virtue of the strong).
A+B: compounds such as radio + attività = radioattività (radio + activity = radioactivity).
3.3 Linguistic Resources
12On the basis of the linguistic analysis described above, we collected the linguistic resources which we deemed necessary for the task. To this end we used the following freely available corpora:
Paisà: 225 M words corpus automatically annotated (Lyding et al., 2014).
itWaC: 1.5 B words corpus automatically annotated (Baroni et al., 2009)
Wiki-IT-Titles: Wikipedia-IT titles downloaded via WikiExtractor (Attardi, 2016).
Proverbs: 1955 proverbs from Wikiquote (2016) and 371 from an online collection (Dige, 2016).
13In addition, we have constructed the following lexical resources:
DeMauro-Ext: words extracted from “Il Nuovo vocabolario di base della lingua italiana”(De Mauro, 2016b), extended with morphological variations obtained by changing last vowel of the word and checking if the resulting word has frequency ≥1000 in Paisà.
DeMauro-MWEs: MWEs extracted from the “De Mauro online dictionary” (De Mauro, 2016a) composed of 30,633 entries.
4 System description
14In order to build our system, we started processing the selected corpora via standard tokenization (only single word tokens) and removal of punctuation marks and non-word patterns. We next constructed two lexical sets: Clex to cover the clue words, and Slex to cover the solution words. Slex (composed of 7,942 nouns and adjectives in DeMauro-Ext) is smaller than Clex (composed of 19,414 words from the full DeMauro-Ext and DeMauro-MWEs) because solution words are almost always nouns or adjectives as described in Section 3.2.
15Secondly, we built a co-occurrence matrix Mc which stores the counts ci,j for every pair of words wi Slex and wi Clex such that wi co-occurs with wj in the resources according to patterns described in Section 3.2. Co-occurrence patterns were extracted from Paisà and itWaC with weight w=1, from DeMauro-MWE with w=200, from Proverbs with w=100, and from Wiki-IT-Titles with w=50. The weight were chosen manually taking into account the likelihood that a pattern in a given corpus represented a valid MWE. Compound patterns (A+B) were extracted Clex: for every word w in Clex if w=ab, a and b are both in Clex, and a and b have at least 4 characters, the count for the pair (a,b) is incremented by 1 in the co-occurrence matrix.
16Thirdly, for every pair of words wi and wj in Mc, we populate the association-score matrix Mpmi via the Pointwise Mutual Information measure:
17 (1)
where
18 (2)
19 (3)
20 (4)
21Finally, for a given game instance with the 5 clue words G = (wc1, wc2, wc3, wc4, wc5), we choose the solution word Slex such that:
22 (5)
that is, we choose the word in which maximizes the score obtained by summing the pmi between each clue word and the candidate word. If two words are never seen co-occurring together in a pattern in the training corpora, we assign to them the lowest pmi value in Mpmi.
23The system has been implemented in Python and the code is open source.1 After the matrix has been loaded into memory the response time on an average laptop is around 1-2 seconds.
5 Results
24According to Basile et al. (2018), Unior4NLP is the best performing system in the Evalita NLP4FUN task. Table 1 provides the detailed results, including split results on TV and Board Game (BG) subsets. The system achieved a very high performance: in more than half of the games (64/105) it is able to guess the correct word.
25In the attempt to compare the performance of our AI system with that of a top player we analyzed the games played by Andrea Saccone, who has been the biggest champion of the Ghigliottina game so far: he was champion for 13 days (3-15 March 2018), and he managed to find the correct solution three times.2 In comparison, Unior4NLP was able to win the same game instances 9 times.
Table 1: Results on the TEST and DEV set. Evaluations are the MRR (Mean Reciprocal Rank) and R@100 (recall at 100)
SET | SIZE | MRR | R@100 |
TEST ALL | 105 | 0,64 | 0.82 |
TEST TV | 66 | 0.67 | 0.88 |
TEST BG | 39 | 0.60 | 0.72 |
DEV ALL | 315 | 0.56 | 0.80 |
DEV TV | 204 | 0.61 | 0.85 |
DEV BG | 111 | 0.48 | 0.71 |
26The plot in Figure 1 shows the distributions of the scores for the correct and missed solutions of our system on the full set of games in the development and test set (420 in total). This allows us to set a number of confidence values for our system: if the system returns a solution of a game with a score S≥10 we can be reasonably certain (68/69=99%) that the system has guessed the correct solution, if 5≤S<10 we are above chance level (50/70=71%), if 0≤S<5 we are at chance level (56/112=50%) and if S is below 0, we are below chance level (48/169=28%).
5.1 Data-set analysis
27In the development data-set we found several cases which fall outside the patterns we observed in out data-set. For instance, we noticed the presence of digits in some clues or solution words (1973, 33), game instances with a clue being also the solution (‘sostanza’, ‘fuori’), and words being spelled in different ways (‘tenère’, ‘tenere’).
28Moreover, we also observed a number of ‘clue - solution’ pairs which are very difficult to relate. We list below some examples with some possible explanation:
g - orecchio: (g - ear) the letter ‘g’ has the shape of a ear.
classe 1973 - 33: (class 1973 - 33) this game instance was from 2006, and that year people born in 1973 were 33 years old.
...—... - titanic: the clue being the S.O.S. beacon in morse code.
29One possible reason for these inconsistent cases is that Board Game edition use slightly different criteria to correlate words,3 and that those from the TV set date back to the very first editions of the TV game (when correlation criteria where probably not yet well defined).
5.2 System error analysis
30In this section we analyze some types of errors that our system makes, and we provide some suggestion for possible improvement.
31Word similarity Although quite rare, few of the clue-solution links can be explained by the similarity relation. An example is the clue-solution pair sincero-franco (sincere-frank). Those are not easily captured by patterns of the types described in Section 3.2, but could be included by means of automatic detection of word similarity via Vector Space Models (Salton et al., 1975) as done in Basile et al. (2016).
32Missing words As explained in Section 3.2, we restricted the set of words in the solution set. This choice, while helping the system to restrict the search space, leads to some coverage issues. For instance, pennello (brush) is one of the solutions of the games in the test data not present in our solution set. In the future we would like to experiment increasing the size of the solution set while avoiding performance and memory problems.
33Wrong PoS Our system analyzes words in their surface form, so it cannot distinguish cases where the same word-form can have multiple Part of Speech (PoS) (with different meaning). To avoid this problem we could envision a system which takes PoS and word-sense disambiguation into consideration.
34Multiword clues Although the great majority of the clues are constituted by a single word, there are a few exceptions (typically names of saints). The current system considers only single-word tokens, so if a game has a 2-word clue, it is regarded as two separate clues (their contribution is then average to obtain the final score). The system could be optimized by using a tokenizer which keeps specific types of bigrams connected.
35Association metrics As described in Section 4, we compute the association score between any pair of words in the matrix via the Pointwise Mutual Information measure (pmi). There is still a big number of alternative measures (Pecina, 2010) that might lead to higher performance.
6 Conclusions and future work
36In this paper we described UNIOR4NLP, an artificial player of “La Ghigliottina”, a challenging game which requires linguistic knowledge to be solved. We described the preliminary steps that we made before developing our system (identifying linguistic patterns that are relevant in the game) as well as the algorithms and the methodology we adopted. The system achieved a high performance but we believe that with further tuning it can still be improved.
37Future work will focus on adopting the same methodology to automatically create novel game instances: using the same association-matrix we can choose a random word (the solution) and present the list of 5 clues with high score.
38In order to make our system easily testable by the scientific community and general public, we have built an interactive version which can be accessed via a Telegram bot4 and on Twitter5 (see Figure 2).
Acknowledgments
39This research has been partly supported by the PON Ricerca e Innovazione 2014/20 fund. Authorship contribution is as follows: Johanna Monti is author of Sections 1, 2, 3.3 and 6; Federico Sangati of Section 4 and 5, and Antonio Pascucci of Sections 3.1. and 3.2.
Bibliographie
Des DOI sont automatiquement ajoutés aux références bibliographiques par Bilbo, l’outil d’annotation bibliographique d’OpenEdition. Ces références bibliographiques peuvent être téléchargées dans les formats APA, Chicago et MLA.
Format
- APA
- Chicago
- MLA
Giuseppe Attardi. 2016. Wikiextractor. http://attardi.github.io/wikiextractor. Last accessed on the 1st October 2018.
Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43(3):209–226.
Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, and Giovanni Semeraro. 2016. Solving a complex language game by using knowledge-based word associations discovery. IEEE Transactions on Computational Intelligence and AI in Games, 8(1):13–26.
10.1109/TCIAIG.2014.2355859 :Pierpaolo Basile, Marco de Gemmis, Lucia Siciliani, and Giovanni Semeraro. 2018. Overview of the evalita 2018 solving language games (nlp4fun) task. In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors, Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA’18). CEUR.org, Turin, Italy.
10.4000/books.aaccademia.4421 :Mathieu Constant, Gülşen Eryiğit, Johanna Monti, Lonneke Van Der Plas, Carlos Ramisch, Michael Rosner, and Amalia Todirascu. 2017. Multiword expression processing: A survey. Computational Linguistics, 43(4):837–892.
Tullio De Mauro. 2016a. Il Nuovo De Mauro (Online). https://dizionario.internazionale.it. Last accessed on the 1st October 2018.
Tullio De Mauro. 2016b. Il Nuovo vocabolario di base della lingua italiana (pdf version). https://www.internazionale.it/opinione/tullio-de-mauro/2016/12/23/il-nuovo-vocabolario-di-base-della-lingua-italiana. Last accessed on the 1st October 2018.
Antonio Dige. 2016. Raccolta di proverbi e detti italiani. http://web.tiscali.it/proverbiitaliani. Downloaded on the 24th April 2018.
David A. Ferrucci, Anthony Levas, Sugato Bagchi, David Gondek, and Erik T. Mueller. 2013. Watson: Beyond jeopardy! Artif. Intell., 199:93–105.
Verena Lyding, Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice Dell’Orletta, Henrik Dittmann, Alessandro Lenci, and Vito Pirrelli. 2014. The PAISÀ corpus of italian web texts. In Proceedings of the 9th Web as Corpus Workshop (WaC-9), pages 36–43. Association for Computational Linguistics, Gothenburg, Sweden.
Pavel Pecina. 2010. Lexical association measures and collocation extraction. Language Resources and Evaluation, 44(1-2):137–158.
Ivan A Sag, Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger. 2002. Multiword expressions: A pain in the neck for nlp. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 1–15. Springer.
G. Salton, A. Wong, and C. S. Yang. 1975. A vector space model for automatic indexing. Commun. ACM, 18(11):613–620.
10.1145/361219.361220 :Giovanni Semeraro, Pasquale Lops, Pierpaolo Basile, and Marco De Gemmis. 2009. On the tip of my thought: Playing the guillotine game. In Proceedings of the 21st International Jont Conference on Artifical Intelligence, IJCAI’09, pages 1543–1548. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
Wikiquote. 2016. Proverbi italiani. https://it.wikiquote.org/wiki/Proverbi_italiani. Downloaded on the 24th April 2018.
Georgios N Yannakakis and Julian Togelius. 2018. Artificial Intelligence and Games. Springer.
10.1007/978-3-319-63519-4 :Notes de bas de page
1 https://gitlab.com/kercos/ghigliottina
2 The players who reach the “Ghigliottina” game (the champion) continue to participate in the subsequent episodes even if they do not guess the solution word.
3 This is supported by results in Table 1, where Board Game results are lower than those from the TV set.
Auteurs
University L’Orientale, Naples, Italy – fsangati[at]unior.it
University L’Orientale, Naples, Italy – apascucci[at]unior.it
University L’Orientale, Naples, Italy – jmonti[at]unior.it
Le texte seul est utilisable sous licence Creative Commons - Attribution - Pas d'Utilisation Commerciale - Pas de Modification 4.0 International - CC BY-NC-ND 4.0. Les autres éléments (illustrations, fichiers annexes importés) sont « Tous droits réservés », sauf mention contraire.
Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015
3-4 December 2015, Trento
Cristina Bosco, Sara Tonelli et Fabio Massimo Zanzotto (dir.)
2015
Proceedings of the Third Italian Conference on Computational Linguistics CLiC-it 2016
5-6 December 2016, Napoli
Anna Corazza, Simonetta Montemagni et Giovanni Semeraro (dir.)
2016
EVALITA. Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 7 December 2016, Naples
Pierpaolo Basile, Franco Cutugno, Malvina Nissim et al. (dir.)
2016
Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017
11-12 December 2017, Rome
Roberto Basili, Malvina Nissim et Giorgio Satta (dir.)
2017
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
10-12 December 2018, Torino
Elena Cabrio, Alessandro Mazzei et Fabio Tamburini (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 12-13 December 2018, Naples
Tommaso Caselli, Nicole Novielli, Viviana Patti et al. (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian Final Workshop
Valerio Basile, Danilo Croce, Maria Maro et al. (dir.)
2020
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
Bologna, Italy, March 1-3, 2021
Felice Dell'Orletta, Johanna Monti et Fabio Tamburini (dir.)
2020
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
Milan, Italy, 26-28 January, 2022
Elisabetta Fersini, Marco Passarotti et Viviana Patti (dir.)
2022