Probing Tasks Under Pressure
p. 228-234
Résumé
Probing tasks are frequently used to evaluate whether the representations of Neural Language Models (NLMs) encode linguistic information. However, it is still questioned if probing classification tasks really enable such investigation or they simply hint for surface patterns in the data. We present a method to investigate this question by comparing the accuracies of a set of probing tasks on gold and automatically generated control datasets. Our results suggest that probing tasks can be used as reliable diagnostic methods to investigate the linguistic information encoded in NLMs representations.
Texte intégral
1. Introduction
1In recent years we saw the raise of a consistent body of work dealing with the use of probing tasks to test the linguistic competence learned by Neural Language Models (NLMs) (Conneau et al. 2018; Warstadt et al. 2019; Hewitt and Liang 2019; Miaschi et al. 2020). The idea behind the probing paradigm is actually quite simple: using a diagnostic classifier, the probing model or probe, that takes the output representations of a NLM as input to perform a probing task, e.g. predict a given language property. If the probing model will predict the property correctly, then we can assume that the representations somehow encode that property. Studies relying on this method reported that NLMs representations do encode several properties related to morphological, syntactic and semantic information.
2Despite the amount of work, there are still several open questions concerning their use (Belinkov 2021): which probing model should we use for assessing the linguistic competence of a NLM? Are probes the most effective strategy to achieve such goal? These questions fostered two complementary lines of research. The first one is devoted to modifying the architecture of the current probing models; the other one is focused on evaluating the effectiveness of probing models. Both are still not well investigated issues, although their importance for advancing the research on the evaluation of NLMs linguistic competences has been widely recognized.
3Among the first line of research, dealing with the design of probing classifiers, several works investigate which model should be used as probe and which metric should be employed to measure their performance. With this respect, it is still questioned if one should rely on simple models (Hewitt and Manning 2019; Liu et al. 2019; Hall Maudslay et al. 2020) or complex ones (Pimentel et al. 2020; Voita and Titov 2020) in terms of model parametrization. Specifically, suggest to design alternative probes using a novel information-theoretic approach which balances the probe inner complexity with its task performance.
4Concerning works facing the issue of investigating the effectiveness of the probing paradigm, observe that probing tasks might conceal the information about the NLM representation behind the ability of the probe to learn surface patterns in the data. To test this idea, they introduced control tasks, a set of tasks that associate word types with random outputs that can be solved by simply learning regularities. Along the same line, test probing tasks by creating control datasets where a property is always reported in a dataset with the same value, thus it is not discriminative for testing the information contained in the representations. Their experiments highlight that the probe may learn a property also incidentally, thus casting doubts on the effectiveness of probing tasks.
5The scenario defined by the latter two works is the one we deal with in this paper. Specifically, we introduce a new approach to put increasingly under pressure the effectiveness of a suite of probing tasks to test the linguistic knowledge implicitly encoded by BERT (Devlin et al. 2019), one of the most prominent NLMs. To achieve this goal, we set up a number of experiments (see Section 2) aimed at comparing the performance of a regression model trained with BERT representations to predict the values of a set of linguistic properties extracted from the Italian Universal Dependency Treebank (Zeman et al. 2020) and from a suite of control datasets we specifically built for the purpose of this study. We define a control dataset as a set of linguistic features whose values were automatically altered in order to be increasingly different from the values in the treebank, referred to as gold values. Our underlying hypothesis is that if the predictions of the increasingly altered values progressively diverge from the predictions of the gold values, this possibly suggests that the corresponding probing tasks are effective strategies to test the linguistic knowledge embedded in BERT representation We will discuss the results of our experiments in light of this hypothesis in Section 3. In Section 4 we will draw the conclusions.
6Note that this is one of the few studies focused on non-English NLMs. In fact, with the exception of (Vries, Cranenburgh, and Nissim 2020; Miaschi et al. 2021; Guarasci et al. 2021), the majority of research related to interpretability issues is focused on English or, at most, multilingual models.
Contributions
7To the best of our knowledge this is the first paper that (i) introduces a methodology to test the reliability of probing tasks by building control tasks at increasing level of complexity, (ii) puts under pressure the probing approach considering the Italian language.
2. Methodology
8Our methodology seeks to investigate the effectiveness of probing tasks for evaluating the linguistic competences encoded in NLM representations. To this aim, we trained a probing model (described in Section 2.1) using BERT sentence representations and then tested its performance when predicting the values of a set of linguistic features (see Section 2.3) in multiple scenarios. In one scenario, the model shall predict gold values, thus corresponding to the real values of the features in the corpus. In the other scenarios, we automatically altered the feature values at different control levels each corresponding to increasing degrees of pressure for the probing model, as discussed in Section 2.4.
9Our methodology will allow us to test whether the probing model really encodes linguistic competences or simply learns regularities in the task and data distributions by checking the results obtained in the different scenarios. If the predictions of the probing model will be more similar to the gold values than to the automatically altered ones, then we might assume that the information captured by the probed feature is encoded in the representations.
2.1 Model
10Our model is a pre-trained Italian BERT. Specifically, we used the base cased BERT developed by the MDZ Digital Library Team, available trough the Huggingface’s Transformers library (Wolf et al. 2020)1. The model was trained using Wikipedia and the OPUS corpus (Tiedemann and Nygaard 2004). For the sentence-level representations, we leveraged the activation of the first input token [CLS]. The probing model is a linear Support Vector Regression model (LinearSVR).
2.3 Data
11Our experiments are carried out on the Italian Universal Dependencies Treebank (IUDT), version 2.5 (Zeman et al. 2020), containing a total of 35,480 sentences. Due to the IUDT high variability in terms of sentence length2, we focused on a sub-set of sentences with a ± 10 tokens variation with respect to the median sentence length (i.e. 20 tokens). As a result, we selected 21,991 sentences whose length ranges between 10 and 30 tokens. This way our dataset is balanced, viz., the amount of sentences with exact same length considered for the experiments is comparable. Specifically, our dataset accounts for around 1,000 sentences for each reported value of sentence length, which makes the results of our analyses reliable and comparable.
2.3 Linguistic Features
12The probing tasks we defined consist in predicting the value of multiple linguistic features, each corresponding to a specific property of sentence structure. The set includes 77 linguistic features and it is based on the ones described in modeling 7 main aspects of the structure of a sentence, which are reported in Table 1. They range from morpho-syntactic and inflectional properties, to more complex aspects of sentence structure (e.g. the depth of the whole syntactic tree), to features referring to the structure of specific sub-trees, such as the order of subjects and objects with respect to the verb, to the use of subordination.
Table 1: Linguistic features probed in the experiments.
Morphosyntactic information |
Distibution of UD POS |
Lexical density |
Inflectional morphology |
Distribution of lexical verbs and auxiliaries for inflectional categories (tense, mood, person, number) |
Verbal Predicate Structure |
Distribution of verbal heads and verbal roots |
Average verb arity and distribution of verbs by arity |
Global and Local Parsed Tree Structures |
Depth of the whole syntactic tree |
Average length of dependency links and of the longest link |
Average length of prepositional chains and distribution by depth |
Average clause length |
Relative order of elements |
Distribution of subjects and objects in post- and pre-verbal position |
Syntactic Relations |
Distribution of dependency relations |
Use of Subordination |
Distribution of subordinate and principal clauses |
Average length of subordination chains and distribution by depth |
Distribution of subordinates in post- and pre-principal clause position |
13We chose to rely on these features for two main reasons. Firstly, they have been shown to be highly predictive when leveraged by traditional learning models on a variety of classification problems where the linguistic information plays a fundamental role. In addition, they are multilingual as they are based on the Universal Dependency formalism for sentence representation (Nivre 2015). In fact, they have been successfully used to profile the knowledge encoded in the language representations of contextual NLMs for both the Italian (Miaschi et al. 2021) and English language (Miaschi et al. 2020).
14In this study, the values of each feature acquired from IUDT represent the gold dataset and they have been automatically altered in order to generate additional control datasets.
2.4 Control Datasets
15We created two main types of control datasets, obtained by automatically altering gold feature values. The first main type (hereafter referred to as Swapped) is built by shuffling the original values of each feature across sentences; while the second type (Random) contains values randomly generated within the maximum and the minimum value that each feature shows in the whole gold dataset. To clarify, consider the following example involving the feature average link length, which captures the average linear distance between dependents and their syntactic head within a sentence. In the Swapped variant we simply swap the feature values, thus a sentence which originally showed an average link length of, e.g., 2.86 could be changed to 8.83. Note that both are real values extracted from our dataset. When building the Random variant, all sentences considered for the study show a feature value randomly generated between 1.33 and 9.78, which are the reported minimum and maximum average link length values in the dataset, respectively associated to sentences with length 11 and 21.
16Since the values of the considered features are strongly related to the length of the sentence, for each type of control dataset we built two sub-types of datasets. In a first sub-type (Bins), we grouped sentences falling into the same predefined range of sentence lengths (i.e., 10-15, 15-20, 20-25 and 25-30 tokens). In a second sub-type (Lengths), we included groups of sentences having exactly the same length. This motivates the choice of sentences whose length ranges in an interval for which we have a reliable amount of instances (as introduced in Section 2.2).
17Note that the different data altering strategies are conceived to represent increasingly challenging testbeds to assess the effectiveness of our probing tasks. The Swapped control datasets are the most challenging ones as the swapped feature values might be quite similar to the gold ones, thus possibly predicted with an high accuracy by the probing model. Such intuition is confirmed by the results of the 2-dimensional Principal Component Analysis (PCA) reported in Figure 13. As we can see, all the data points representing the feature values contained in the Swapped datasets fully overlap with the gold ones, thus confirming their similarity. On the contrary, randomly generated values are progressively more distant being less plausible, even if the constraints of sentence length yield values that are closer to the gold ones.
3. Results
18For both gold and control datasets, probing scores are computed as a Spearman correlation between the feature values predicted by the probing model and the values contained in each dataset. Such correlation values are computed by averaging the NLM’s layer–wise scores as, for all datasets, we observed small differences between the scores obtained across the 12 layers. We experimentally verified that these differences were not significant by computing the slope of a linear regression line between BERT layers and the scores of the gold dataset, obtaining -0.0017 as mean value considering all features. Our intuition is that the small range of lengths of the sentences here considered may have yielded such insignificant variation across layers, which on the contrary showed to be significant on the whole set of IUDT sentences. Namely, being highly related to the length of the sentence, the feature values have little variations. However, a more in-depth investigation of the underlying reasons of this outcome is one of the future directions of this work.
19Figure 2 shows the scores obtained on the gold and the 6 control datasets, both for the 7 macro-groups of linguistic features and on average (AVG). Additionally, in order to properly appreciate the differences between the results obtained on the gold and control datasets, in Figure 3 we report the error reduction rate for each control dataset computed as the difference between the scores obtained when predicting gold and altered features.
General Results
20We can observe that on average the highest probing scores are obtained on the gold dataset and that, accordingly, there is a great difference (i.e. almost 1.0, see Figure 3) between the accuracy of the probing model when predicting the authentic and altered feature values. This seems suggesting that the model is able to recognize that the feature values contained in the control datasets have been altered, even when they are not fully random but plausible, i.e. in the Swapped datasets. As a consequence, we can hypothesize that the model is relying on some implicit linguistic knowledge when it predicts the authentic feature values, rather than learning some regularities possibly found in the dataset.
21However, if we take a closer look at the scores obtained for the Random and Swapped datasets when we constrain the length of the sentences, we can observe that the accuracy in predicting the feature values contained in the Swapped datasets is sightly higher than in the Random ones (see ‘AVG’ column in Figure 2). This is in line with our starting hypothesis and shows that feature values artificially created simply by shuffling gold ones across sentences of the same lengths (or of the same range of lengths) are more similar to the gold values and thus are predicted with higher accuracy than randomly altered values. Nevertheless, their error rate, namely the difference from the accuracy of gold predictions, is still quite high, i.e. about 0.80 (see the ‘AVG’ column, Figure 3).
Linguistic Features Analysis
22Also when we focus on the results obtained with respect to the 7 macro-groups of linguistic features, we can observe that the probing model is more accurate in the prediction of the gold values. Again, the scores on the control datasets are slightly higher when we constrain the values with respect to sentence length, since we narrow the range of possible values. In particular, we see that the feature values related to the sentence tree structure are those predicted most closely to the gold ones (see column ‘TreeStructure’, Figure 3). Note that these sentence properties are the most sensitive to the sentence length, that BERT encodes with a very high accuracy. This may suggest that in the resolution of these tasks the probing model is possibly relying on some regularities related to sentence length.
23Similar observations hold for the results achieved in the resolution of the probing tasks related to the use of subordination, which heavily depends on sentence length. Interestingly, we can note that the values of all the other groups of features contained in the control datasets are predicted by the probing model with a very low accuracy, possibly making the results not significant.
Features Correlations
24Once we showed that the probing tasks accuracy is very different if the feature values are authentic or altered, in this section we compare the ranking of linguistic features ordered by decreasing prediction accuracy in the gold and control scenarios. As we can see in Table 2, which reports the Spearman correlations between the rankings, the control rankings are almost not related to the gold one and the existing correlations in most cases are not even statistically significant. The only exceptions are represented by the rankings of values that were randomly generated with sentence length constraints, which have a weak and moderate correlation. Note that however, as shown before, the probing scores are very low.
Table 2: Spearman correlations between the rankings of features obtained with the Gold dataset and the 6 control datasets. Statistically significant correlations are marked with * (p-value < 0.05).
Dataset | Spearman correlation |
Random | 0.08 |
Random Bins | 0.46 * |
Random Lengths | 0.33 * |
Swapped | -0.15 |
Swapped Bins | 0.05 |
Swapped Lengths | 0.06 |
25A more qualitative feature ranking analysis can be carried out by inspecting Table 3 where we report the first 15 top-ranked features predicted in the gold and in the two most highly correlated Swapped and Random datasets. As we can see, the gold ranking diverges from the rankings of the altered values with respect to the majority of top-ranked features. The most visible exception is represented by the distribution of syntactic root (dep_dist_root) that the probing model always predicts with the highest accuracy. The result is quite expected since this feature can be seen as a proxy of the length of the sentence, a linguistic property properly encoded by BERT. Similarly, other two features influenced by sentence length appear, as expected, on the top positions of all rankings, namely the distribution of the sentence boundary punctuation (xpos_dist_FS) and of verbal heads (verbal_head_per_sent).
Table 3: 15 top-ranked Gold and control features (Random Bins and Swapped Lengths) predicted by BERT sentence-level representations.
Gold | Random Bins | Swapped Lengths |
dep_dist_root | dep_dist_root | dep_dist_root |
dep_dist_punct | avg_max_links_len | avg_max_links_len |
upos_dist_PUNCT | max_links_len | max_links_len |
xpos_dist_FS | xpos_dist_FB | avg_max_depth |
upos_dist_ADP | avg_token_per_clause | verbal_head_per_sent |
dep_dist_det | xpos_dist_FS | xpos_dist_FS |
upos_dist_PROPN | n_prep_chains | avg_links_len |
upos_dist_DET | avg_max_depth | subord_prop_dist |
xpos_dist_RD | verbal_head_per_sent | avg_subord_chain_len |
dep_dist_case | xpos_dist_RI | n_prep_chains |
verbal_head_per_sent | dep_dist_cop | subord_post |
xpos_dist_FF | xpos_dist_PC | subord_dist_1 |
xpos_dist_SP | dep_dist_conj | avg_prep_chain_len |
xpos_dist_E | xpos_dist_B | obj_post |
upos_dist_NOUN | xpos_dist_VA | avg_verb_edges |
4. Discussion and Conclusion
26In this paper we described a methodology to test the effectiveness of a suite of probing tasks for evaluating the linguistic competence encoded by NLMs. To this aim, we analysed the performance of a probing model trained with BERT representations to predict the authentic and automatically altered values of a set of linguistic features derived from IUDT. We observed general higher performance in the prediction of authentic values, thus suggesting that the probing model relies on linguistic competences to predict linguistic properties. However, when we constrained automatically altered values with respect to sentence length, the model tends to learn surface patterns in the data.
27As a general remark, it should be pointed out that our analyses dealt only with sentences showing a standard length (i.e., between 10 and 30 tokens per sentence). This choice, if on the one hand made our results more directly comparable across bins of sentences sharing the same length, on the other hand excluded from the analyses the shortest and the longest sentences of IUDT. Our future work will be devoted to replicate the probing task experiments described in this paper also on control datasets comprising sentences whose length is outside of the range considered here. To this aim, we performed preliminary analyses to test the scores of probing tasks on gold IUDT sentences that are less than 10-token and more than 30-token long. Interestingly, we noticed that the probing model is less accurate when predicting the linguistic features extracted from the group of IUDT short sentences. Specifically, the average Spearman correlation obtained on such group is 0.47, while probing scores on longer sentences (+30-token long) and on those used in our experiments achieved an average correlation of 0.56 and 0.66 respectively. Starting from this preliminary finding, a possible future investigation could focus on whether using longer or shorter sentences would also have an effect on the probing scores obtained with the control datasets.
28In future work we also plan to investigate which features are more diagnostic of the linguistic competence encoded by a NLM and which ones, on the contrary, are more influenced by confounders, such as sentence length.
Bibliographie
Des DOI sont automatiquement ajoutés aux références bibliographiques par Bilbo, l’outil d’annotation bibliographique d’OpenEdition. Ces références bibliographiques peuvent être téléchargées dans les formats APA, Chicago et MLA.
Format
- APA
- Chicago
- MLA
Yonatan Belinkov. 2021. “Probing Classifiers: Promises, Shortcomings, and Advances.” Computational Linguistics, October, 1–12. https://0-doi-org.catalogue.libraries.london.ac.uk/10.1162/coli_a_00422.
10.1162/coli_a_00422 :Alexis Conneau, German Kruszewski, Guillaume Lample, Loı̈c Barrault, and Marco Baroni. 2018. “What You Can Cram into a Single &!#* Vector: Probing Sentence Embeddings for Linguistic Properties.” In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2126–36. Melbourne, Australia: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/P18-1198.
10.18653/v1/P18-1198 :Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–86. Minneapolis, Minnesota: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/N19-1423.
10.18653/v1/N19-1423 :Raffaele Guarasci, Stefano Silvestri, Giuseppe De Pietro, Hamido Fujita, and Massimo Esposito. 2021. “Assessing BERT’s Ability to Learn Italian Syntax: A Study on Null-Subject and Agreement Phenomena.” Journal of Ambient Intelligence and Humanized Computing, 1–15.
Rowan Hall Maudslay, Josef Valvoda, Tiago Pimentel, Adina Williams, and Ryan Cotterell. 2020. “A Tale of a Probe and a Parser.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7389–95. Online: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/2020.acl-main.659.
10.18653/v1/2020.acl-main.659 :John Hewitt and Percy Liang. 2019. “Designing and Interpreting Probes with Control Tasks.” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (Emnlp-Ijcnlp), 2733–43.
10.18653/v1/D19-1 :John Hewitt and Christopher D Manning. 2019. “A Structural Probe for Finding Syntax in Word Representations.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4129–38.
Ian T. Jolliffe, and Jorge Cadima. 2016. “Principal Component Analysis: A Review and Recent Developments.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374 (2065): 20150202.
Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith. 2019. “Linguistic Knowledge and Transferability of Contextual Representations.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 1073–94. Minneapolis, Minnesota: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/N19-1112.
10.18653/v1/N19-1112 :Alessio Miaschi, Dominique Brunato, Felice Dell’Orletta, and Giulia Venturi. 2020. “Linguistic Profiling of a Neural Language Model.” In Proceedings of the 28th International Conference on Computational Linguistics, 745–56. Barcelona, Spain (Online): International Committee on Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/2020.coling-main.65.
10.18653/v1/2020.coling-main.65 :Alessio Miaschi, Gabriele Sarti, Dominique Brunato, Felice Dell’Orletta, and Giulia Venturi. 2021. “Italian Transformers Under the Linguistic Lens.” In Proceedings of the Seventh Italian Conference on Computational Linguistics (Clic-It 2020), edited by Johanna Monti, Felice Dell’Orletta, and Fabio Tamburini. Online: CEUR Workshop Proceedings (CEUR-WS.org).
Joakim Nivre. 2015. “Towards a Universal Grammar for Natural Language Processing.” In Proceedings of the 16th Annual Conference on Intelligent Text Processing and Computational Linguistics (Cicling), 3–16. Springer.
Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay, Ran Zmigrod, Adina Williams, and Ryan Cotterell. 2020. “Information-Theoretic Probing for Linguistic Structure.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4609–22.
Jörg Tiedemann and Lars Nygaard. 2004. “The OPUS Corpus - Parallel and Free: https://aclanthology.org/L04-1174/.” In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). Lisbon, Portugal: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf.
Elena Voita, and Ivan Titov. 2020. “Information-Theoretic Probing with Minimum Description Length.” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (Emnlp), 183–96.
Wietse de Vries, Andreas van Cranenburgh, and Malvina Nissim. 2020. “What’s so Special About BERT’s Layers? A Closer Look at the NLP Pipeline in Monolingual and Multilingual Models.” In Findings of the Association for Computational Linguistics: EMNLP 2020, 4339–50. Online: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/2020.findings-emnlp.389.
10.18653/v1/2020.findings-emnlp.389 :Alex Warstadt, Yu Cao, Ioana Grosu, Wei Peng, Hagen Blix, Yining Nie, Anna Alsop, et al. 2019. “Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs.” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (Emnlp-Ijcnlp), 2877–87. Hong Kong, China: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/D19-1286.
10.18653/v1/D19-1286 :Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, et al. 2020. “Transformers: State-of-the-Art Natural Language Processing.” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38–45. Online: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/2020.emnlp-demos.6.
10.18653/v1/2020.emnlp-demos.6 :Daniel Zeman, Joakim Nivre, Mitchell Abrams, Noëmi Aepli, Željko Agic, Lars Ahrenberg, and others. 2020. “Universal Dependencies 2.5.” LINDAT/CLARIAHCZ Digital Library at the Institute of Formal and Applied Linguistics (UFAL), Faculty of Mathematics and Physics, Charles University. Url: Http://Hdl. Handle. Net/11234/1-3226.
Notes de bas de page
1 https://huggingface.co/dbmdz/bert-base-italian-xxl-cased
2 IUDT contains sentences ranging from 1 to 308 token long.
3 PCA is a classical data analysis method that reduces the dimensionality of the data while retaining most of the variation in the data set by identifying n principal components, along which the variation of the data is maximal (Jolliffe and Cadima 2016).
Auteurs
Istituto di Linguistica Computazionale “Antonio Zampolli”, Pisa – ItaliaNLP Lab – www.italianlp.it – Department of Computer Science, University of Pisa – alessio.miaschi@phd.unipi.it
Istituto di Linguistica Computazionale “Antonio Zampolli”, Pisa – ItaliaNLP Lab – www.italianlp.it – name.surname@ilc.cnr.it
Istituto di Linguistica Computazionale “Antonio Zampolli”, Pisa – ItaliaNLP Lab – www.italianlp.it – name.surname@ilc.cnr.it
Istituto di Linguistica Computazionale “Antonio Zampolli”, Pisa – ItaliaNLP Lab – www.italianlp.it – name.surname@ilc.cnr.it
Istituto di Linguistica Computazionale “Antonio Zampolli”, Pisa – ItaliaNLP Lab – www.italianlp.it – name.surname@ilc.cnr.it
Le texte seul est utilisable sous licence Creative Commons - Attribution - Pas d'Utilisation Commerciale - Pas de Modification 4.0 International - CC BY-NC-ND 4.0. Les autres éléments (illustrations, fichiers annexes importés) sont « Tous droits réservés », sauf mention contraire.
Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015
3-4 December 2015, Trento
Cristina Bosco, Sara Tonelli et Fabio Massimo Zanzotto (dir.)
2015
Proceedings of the Third Italian Conference on Computational Linguistics CLiC-it 2016
5-6 December 2016, Napoli
Anna Corazza, Simonetta Montemagni et Giovanni Semeraro (dir.)
2016
EVALITA. Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 7 December 2016, Naples
Pierpaolo Basile, Franco Cutugno, Malvina Nissim et al. (dir.)
2016
Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017
11-12 December 2017, Rome
Roberto Basili, Malvina Nissim et Giorgio Satta (dir.)
2017
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
10-12 December 2018, Torino
Elena Cabrio, Alessandro Mazzei et Fabio Tamburini (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 12-13 December 2018, Naples
Tommaso Caselli, Nicole Novielli, Viviana Patti et al. (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian Final Workshop
Valerio Basile, Danilo Croce, Maria Maro et al. (dir.)
2020
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
Bologna, Italy, March 1-3, 2021
Felice Dell'Orletta, Johanna Monti et Fabio Tamburini (dir.)
2020
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
Milan, Italy, 26-28 January, 2022
Elisabetta Fersini, Marco Passarotti et Viviana Patti (dir.)
2022