Atypical or Underrepresented? A Pilot Study on Small Treebanks
p. 11-19
Résumé
We illustrate an approach for multilingual treebanks explorations by introducing a novel adaptation to small treebanks of a methodology for identifying cross-lingual quantitative trends in the distribution of dependency relations. By relying on the principles of cross-validation, we reduce the amount of data required to execute the method, paving the way to expanding its use to low-resources languages. We validated the approach on 8 small treebanks, each containing less than 100,000 tokens and representing typologically different languages. We also show preliminary but promising evidence on the use of the proposed methodology for treebank expansion.
Remerciements
We would like to sincerely thank the anonymous reviewers for their helpful comments.
Texte intégral
1. Introduction and Motivation
1Linguistically-annotated language resources like treebanks are fundamental for developing reliable models to train and test tools used to address Natural Language Processing (NLP) tasks acquiring linguistic evidence from corpora. Concerning the latter, researchers frequently rely on multilingual or parallel resources in contrastive studies to quantify the similarities and differences between languages (Jiang and Liu 2018). Over the past few years, the Universal Dependencies (UD) initiative1 ERROR_CITE has further encouraged such studies. UD defines a universal inventory of categories and guidelines to facilitate consistent annotation of similar constructions across languages (Nivre 2015; Marneffe et al. 2021), and, at present, the project includes about 200 treebanks representing over 100 languages. The consistent annotation of linguistic phenomena under a shared representation and across different languages makes UD treebanks exceptionally well suited for quantitative comparison of languages (see, for example Croft et al. (2017), Berdicevskis et al. (2018), Vylomova et al. (2020) and among our works, Alzetta et al. (2019a).
2Despite their great relevance for linguistic investigations, large treebanks are available for only a tiny fraction of the world’s languages (Vania et al. 2019). Even within the UD project, around 60% of the treebanks can be considered small, i.e. containing less than 100,000 tokens. Treebank size, in fact, is generally identified as the bottleneck for obtaining high-quality representative models of language use to be employed in downstream NLP applications. In general terms, larger datasets allow for better generalisations of language constructions, leading to better performances of systems trained using such data (Zeman et al. 2018). In fact, ad-hoc strategies are generally needed when dealing with low-resourced languages (Hedderich et al. 2021).
3This paper illustrates a novel workflow specifically designed to adapt an existing methodology for treebank exploration to small treebanks. The base method, extensively described by Alzetta et al. (2020b), relies on an unsupervised algorithm called LISCA (LInguistically–driven Selection of Correct Arcs) (Dell’Orletta, Venturi, and Montemagni 2013). LISCA has been successfully employed in past works for performing quantitative cross-lingual analyses (Alzetta et al. 2019a, 2019b, n.d.) and error detection on UD treebanks (Alzetta et al. 2017). The algorithm works in two main steps. First, it acquires evidence about language use from the distributions of phenomena in annotated sentences. The algorithm then uses such evidence to distinguish typical from atypical constructions in an unseen set of sentences. The typicality of a construction is determined with respect to the examples observed in a corpus used as a reference, and is encoded with a score. This score, in fact, reflects the probability of observing a dependency occurring in a given context (both sentence-level and corpus-level) on the basis of the constructions sharing common properties reported in the reference corpus. Hence, from our point of view, typicality and frequency are tightly related concepts, as non-standard constructions are also usually less frequent in natural language use.
4As such, the LISCA methodology relies on large sets of automatically parsed sentences to collect the statistics about phenomena distributions: even if the data contains parsing errors2, the corpus size guarantees the collected statistics reflect the actual language use. However, such an approach can be employed only for analysing languages for which large amounts of data are available, or at least for which the parser outputs are generally considered reliable. To overcome such a limit, Aggarwal (2020) suggested that if the statistics are acquired from gold annotations (such as treebanks), the algorithm could collect the statistics from fewer data since these resources are assumed to be error-free.
5We implemented this proposal by adapting the original LISCA workflow as detailed in Section 2. Our variation to the original methodology is inspired by the k-fold approach commonly used for performing systems’ cross-validation: according to this approach, a dataset is split into sub-sets of equal size, iteratively used for training and/or evaluating a system. We employ a similar strategy for evaluating the typicality of the dependency relations in each treebank split, acquiring the statistics from the sentences contained in the other splits rather than from an external reference corpus. This small but substantial change in the method workflow allows us to apply the LISCA algorithm to small treebanks, which is particularly relevant in the case of analyses performed on low-resource languages.
6We tested the methodology in a case study, reported in Section 3, involving 8 languages represented using UD treebanks. Our goal is to test if our method can support linguistic investigations for exploring and quantifying similarities and differences between typologically different languages. To this aim, we first validate the adaptation to the original LISCA approach proposed here in Section 3.1. Then, we exemplify how the obtained results can be employed for linguistic investigations in Section 3.2. To improve the cross–linguistic comparability of the analysis, we relied on Parallel UD (PUD) treebanks: a collection of parallel treebanks developed for the CoNLL–2017 Shared Task on multilingual parsing (Zeman et al. 2017) and linguistically annotated under the UD representation. Being parallel, PUDs are particularly well suited for carrying out multilingual studies since they contain only 1,000 sentences manually translated from English into the other languages, representing a perfect testbed for our approach.
7Before concluding the paper in Section 5, we report the results of preliminary investigations to explore whether our approach could also be employed for automatically identifying underrepresented phenomena in treebanks. Søgaard (2020) and Anderson et al. (2021) argue that some treebanks cover only a restricted sample of the structures commonly used in a language, leaving out less common phenomena. This leakiness might affect the performances of NLP systems even more than the system architecture. Thus, treebanks should be expanded not only to improve their representativeness but also to obtain more truthful performances of systems trained using them. Section 4 investigates if our methodology can contribute to this issue by exploring its application in automatic treebank expansion.
8The contributions of the paper can be listed as: (i) a novel approach specifically designed for carrying out multilingual investigations on small treebanks; (ii) a case study involving eight typologically different languages to test the methodology; and (iii) a novel formula, introduced in Section 3.2, to measure the distance between dependents and their syntactic head which improves the cross-lingual comparability of treebanks with respect to such property.
2. Approach
9The method presented in this paper relies on a methodology for treebank exploration based on the unsupervised algorithm LISCA (Dell’Orletta, Venturi, and Montemagni 2013), which we adapted to expand its usage for small treebanks, namely containing less than 100,000 tokens.
10As mentioned earlier, LISCA can be employed to quantify the typicality of each dependency relation (hereafter deprel)3 of a linguistically annotated corpus with respect to a large set of examples taken as reference (Alzetta et al. 2020). To achieve this goal, the algorithm first collects statistics about linguistically motivated properties of deprels extracted from a corpus of automatically parsed sentences (called reference corpus) to create a statistical model (SM). Then, the algorithm calculates a typicality score for each deprel appearing in a test corpus relying on the SM while also considering its linguistic context to assess the relevance of the dependency label used for marking the dependency in the given context. When interpreting the assigned LISCA score, a deprel marked by LISCA as highly typical was possibly frequently observed in similar contexts also in the reference corpus. In contrast, an atypical deprel could be characterised by certain properties which make it somehow distant from the other instances of dependency marked with the same label in the reference corpus.
11In essence, LISCA computes the score for a given deprel taking into account local properties (e.g., dependency length and direction) of each deprel in the test corpus as well as the linguistic context where it is located (e.g., distance form root, leaves and number of siblings), comparing them both against the properties and contexts of all dependencies annotated with the same dependency label in the reference corpus. For this reason, the reference corpus has generally corresponded to a large corpus of around 40M tokens: the corpus size allows accounting for a more comprehensive set of examples of linguistic constructions while also compensating for possible parser errors.
Workflow
12For this study, we implemented the adaptation of the LISCA workflow proposed by Aggarwal (2020). Inspired by the k-fold validation approach, we modified the original approach as follows:
1) Split a treebank into k portions of equal size (k=4 for this work), each containing the same number of sentences;
2) Use LISCA to acquire the statistics (encoded in the SM) about the distribution of linguistic phenomena from a reference corpus obtained by merging k-1 portions of the previously split treebank;
3) Use the obtained SM to compute the typicality score of the deprels appearing in the remaining treebank portion (i.e., the one not included in the reference corpus);
4) Repeat steps 2 and 3 until all k portions are analysed;
5) Merge the analysed portions and order the deprels by decreasing LISCA score to have a unique ranking of all the deprels in the treebank.
13The ordered ranking of deprels can be explored to investigate which linguistic constructions, represented by means of the deprels, were marked as typical or atypical, characterised by higher and lower scores, respectively.
2.1 Data and Languages
14We tested our method on a selection of Parallel UD (PUD) treebanks (Zeman et al. 2017), each containing 1,000 sentences. In order to encompass different language families and genera4, we carried out the case study on the following eight languages: Arabic (AR; Afro-Asiatic, Semitic), Czech (CZ; Indo-European, Slavic), English (EN; Indo-European, Germanic), Hindi (HI; Indo-European, Indic), Finnish (FI; Uralic, Finnic), Indonesian (ID; Austronesian, Malayo-Sumbawan), Italian (IT; Indo-European, Romance) and Thai (TH; Tai-Kadai, Kam-Tai).
3. Results
3.1 Validating the Approach
15We report the results of an analysis to verify whether the adapted and original LISCA-based methods return comparable results. To this aim, we compared the LISCA ranking of PUD deprels obtained using the original algorithm workflow, which employs a large reference corpus to build the language SM, and the novel workflow defined above, which acquires the statistics from the treebank itself. We carried out this analysis for Italian and English PUD treebanks. We manually verified in previous studies that the original approach applied to those languages allows capturing elements of linguistic and parsing complexity distinguishing between typical and atypical constructions along with the produced ranking of deprels (Alzetta et al. 2019a, 2020).
16We compared the deprel rankings obtained using the two methodology workflows in terms of Spearman correlation, which returns a rank correlation coefficient indicating a statistical dependence between the rankings of two observed variables. The analysis showed a strong and significant correlation between the rankings produced relying on the two workflows in both languages. Specifically, we obtained a Spearman correlation coefficient of 0.95 (p<0.5) for Italian and English.
17Such high correlations confirm that gold corpora, although small, can be used to acquire relevant statistics about language use. Manually revised data might be limited in size. However, their annotations are also generally correct in the case of rare phenomena, which a parser could wrongly annotate due to their low frequency in the data. While large reference corpora compensate for the possibly wrong parses assigned to rare constructions with their size, small reference corpora shall compensate with consistency and correctness. Hence, we could say that using gold data for building the SM allows reducing the number of examples for acquiring language statistics. We notice a difference between the two rankings only when focusing on the bottom part, where we find deprels with the lowest scores. While the original method produces only a tiny number of deprels with LISCA score equal to 0, which we usually excluded from the analyses, we observe many more of them in the ranking produced with our workflow adaptation. LISCA score zero is assigned to those dependencies never observed in the reference corpus; thus, their typicality is extremely low. It is not surprising that smaller reference corpora produce a higher number of these cases, given their limited coverage. However, the high correlation coefficient reported above suggests that such deprels are still interesting from a linguistic perspective. They correspond to rare constructions in the language, obtaining a score slightly higher than zero in the case of a larger reference corpus but are still placed in the lower positions of the ranking.
3.2 Rankings Exploration
18This subsection exemplifies how the ranking of deprels obtained with our adapted approach can be employed in linguistic analyses to identify similarities and differences between languages. For this case study, we focused on a specific property of deprels, namely the length of the dependency link. The length of a deprel, measured as the linear distance in terms of intervening tokens between a word and its syntactic head, is a property frequently explored in linguistically annotated corpora. It is highly related to processing complexity in all languages (Demberg and Keller 2008; Temperley 2007; Futrell, Mahowald, and Gibson 2015; Yu, Falenska, and Kuhn 2019). For example, McDonald and Nivre (2011) observed that parsers tend to make more mistakes on longer sentences and longer dependencies. Such complexity makes this property particularly interesting from a multilingual perspective, especially when dealing with parallel corpora, as in our case study.
19We inspected the ranking of deprels to monitor the LISCA score associated with deprels of different lengths and their distribution along the ranking of each language. To facilitate the rankings exploration and comparison, we split each ranking into three portions of equal size, referred to as top, middle and bottom, where top contains deprels obtaining the highest scores (more typical). In contrast, the bottom contains the deprels with the lowest scores (atypical).
20In order to allow a proper multilingual comparison of the distribution of deprel lengths along with the rankings, we defined the novel measure called Adjusted Link Length (LLadjusted, cf. Figure 1). The measure, inspired by Brevity Penalty used in BLEU score (Papineni et al. 2002), is designed to compute the length of deprels involving content words as dependant while simultaneously improving cross-language comparability as the length of a deprel is measured keeping in mind the overall length of the sentence where it is located and the average sentence length in the treebank. This way, instead of comparing absolute length values, we can observe the tendency of languages towards producing longer or shorter deprels.
21In LLadjusted, we operationally compute the length of deprels as a function of a) the average sentence length in the treebank (TrbAvgSentLen), b) the length of the sentence where the deprel appears (SentLength), and c) the distance, in tokens, between the dependent and its syntactic head (LLraw). The formula’s values of 0.5 and 1.25 were determined empirically to account for unusually short and long sentences, respectively, in the treebank. Thus, the resulting value associated with each deprel denotes it as ‘long’, ‘medium’ or ‘short’ with respect to the average deprel length computed in the treebank. Note that, although our analysis focuses on content words, function words are still accounted for when computing the LISCA score as they might be part of the context of content words.
22Figure 2 displays the distribution of deprels of different lengths (computed using LLadjusted) along the portions of the treebank ranking of each language. The distributions show that longer deprels are given a lower plausibility score by LISCA in all languages. Interestingly, the length distributions are pretty similar across different languages except for Hindi. Such difference could be due to the typical word order of constituents of the considered languages. Hindi, in fact, is the only language of our set where the order of the main constituents is of the type S(ubject)O(bject)V(erb)5, and the dominant word order of a language has been shown to influence the dependency length across major dependency types by Yadav et al. (2020).
23It should be noted that such difference between languages could also be observed computing the length of dependency relations straightforwardly on PUD treebanks: the average linear link length computed on Hindi PUD is 6.54, for Thai PUD, the language showing shorter relations, is 2.67, while the remaining languages show a value ranging between 3.1 and 3.5. However, our methodology allows us to combine multiple properties simultaneously into a score, thus isolating in different portions of the rankings the deprels that show an atypical value for a given property but could be still considered quite typical for the language based on their context. As proof, observe that long and medium deprels in Hindi tend to appear earlier in the ranking than in other languages: 19.73% of deprels located in the middle bin are covered by medium and long deprels, suggesting that longer deprels are more common in Hindi. On the contrary, only 7% of deprels of the middle bin are long in Thai, pointing to their atypicality in the language.
24The above results show the methodology’s effectiveness for exploring tendencies and peculiarities of languages in multilingual studies. However, small samples like PUD treebanks are usually not suited for analysing infrequent phenomena (Taherdoost 2016). Hence, one might wonder if we are actually capturing the atypicality of linguistic constructions, or instead, we are biased by phenomena underrepresented in the treebank. In the following Section, we will explore whether low LISCA scores might be associated with infrequent linguistic phenomena due to under-representation in the data used to build the SM.
4. Towards Treebank Expansion
25Our analyses started from the premise that PUD treebanks are error-free. Therefore we can look at the rankings as containing correctly annotated examples of language use. However, the approach employed in this study does not exclude the scenario that a deprel might obtain a low LISCA score because of a lack of similar constructions in the treebank. We explored this idea both at deprel and sentence level, as described below.
26Concerning the deprel–level analysis, we tested the accuracy of a parser for deprels in the three portions of the LISCA rankings. To this aim, we parsed each PUD treebanks using UDPipe (Straka, Hajic, and Straková 2016), relying on the k-fold approach used to train LISCA: we split each PUD into 4 portions of 250 sentences each, trained UDPipe with of the portions and parsed the remaining portion. Then, we checked if deprels were parsed accurately. Again, we excluded function words from this analysis to improve cross-language comparability and avoid biased results as function words are usually more accurately parsed than content words. We observed that wrongly parsed deprels mainly concentrate in the bottom bins for all languages based on the obtained results. This suggests that there might be a relationship between low LISCA scores and underrepresented phenomena.
27For the sentence-level analysis, we computed the LISCA score for each sentence in all PUD treebanks as the arithmetic mean of the scores of the individual deprels belonging to the sentence to get a sentence–level LISCA score. In the analysis, we explored whether sentences with low average LISCA scores are also more difficult to parse than those with higher average LISCA scores. Having computed the sentence–level LISCA scores, we collected two test sets of 100 sentences each by grouping sentences showing the highest and lowest LISCA scores. Then, we trained UDPipe using the remaining 800 sentences of PUD. The performances of UDPipe on the test sets are reported in terms of Labelled Attachment Score (LAS).
28The results of this experiment are reported in Figure 3. We observe that the test sets composed of sentences characterised by the highest scores are more accurately parsed than the lower-score sets for all the languages involved. Differences between languages in terms of overall Label Attachment Score (LAS) and between the two subgroups of sentences will be further investigated in future work. Such results complement the deprel-level analysis: they suggest that the methodology could isolate difficult-to-parse sentences, and not only deprels, that could be employed to expand treebanks.
29Treebank expansion is extremely valuable for low-resourced languages and small resources in general as it allows to include unseen examples to treebanks. Our results suggest that the sentence suites collected by grouping sentences characterised by the lowest LISCA scores contain difficult-to-parse constructions, possibly underrepresented in PUD, that should be included in the treebank to improve its representativeness.
5. Conclusion
30We proposed a novel workflow to adapt an existing approach for treebank exploration to small treebanks and low-resourced languages. Results of our analyses showed the effectiveness of the methodology in multiple scenarios. First, the adapted method allows obtaining reliable results on par with the original method workflow when performing linguistic explorations of the treebanks. Secondly, the results also show the potential of the method for automatically identifying underrepresented constructions in treebanks. The latter result paves the way for the automatic identification of cases required to expand the treebanks, which we plan to further investigate in future work.
Bibliographie
Des DOI sont automatiquement ajoutés aux références bibliographiques par Bilbo, l’outil d’annotation bibliographique d’OpenEdition. Ces références bibliographiques peuvent être téléchargées dans les formats APA, Chicago et MLA.
Format
- APA
- Chicago
- MLA
Akshay Aggarwal. 2020. Consistency of Linguistic Annotation. Master’s thesis, Univerzita Karlova (ÚFAL), Prague, Czechia, September. Thesis Supervisor Zeman, Daniel.
Chiara Alzetta, Felice Dell’Orletta, Simonetta Montemagni, Petya Osenova, Kiril Simov, and Giulia Venturi. n.d. “Quantitative Linguistic Investigations Across Universal Dependencies Treebanks.” In Proceedings of the Seventh Italian Conference on Computational Linguistics (Clic-It). Bologna (online), Italy.
Chiara Alzetta, Felice Dell’Orletta, Simonetta Montemagni, and Giulia Venturi. 2017. “Dangerous Relations in Dependency Treebanks.” In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories, 201–10. Prague, Czech Republic. https://www.aclweb.org/anthology/W17-7624.
Chiara Alzetta, Felice Dell’Orletta, Simonetta Montemagni, and Giulia Venturi. 2019a. “Inferring Quantitative Typological Trends from Multilingual Treebanks. A Case Study.” Lingue E Linguaggio 18 (2): 209–42.
Chiara Alzetta, Felice Dell’Orletta, Simonetta Montemagni, and Giulia Venturi. 2019b. “Inferring Quantitative Typological Trends from Multilingual Treebanks. A Case Study.” Lingue E Linguaggio XVIII (2): 209–42.
Chiara Alzetta, Felice Dell’Orletta, Simonetta Montemagni, and Giulia Venturi. 2020. “Linguistically-Driven Selection of Difficult-to-Parse Dependency Structures.” IJCoL. Italian Journal of Computational Linguistics 6 (6-2): 37–60.
10.4000/ijcol.719 :Mark Anderson, Anders Søgaard, and Carlos Gómez-Rodríguez. 2021. Replicating and Extending "Be-cause Their Treebanks Leak": Graph Isomorphism, Covariants, and Parser Performance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 1090–1098.
Aleksandrs Berdicevskis, Çăgrı Çöltekin, Katharina Ehret, Kilu von Prince, Daniel Ross, Bill Thompson, Chunxiao Yan, Vera Demberg, Gary Lupyan, Taraka Rama, et al. 2018. Using Universal Dependencies in cross-linguistic complexity research. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 8–17.
William Croft, Dawn Nordquist, Katherine Looney, and Michael Regan. 2017. Linguistic Typology meets Universal Dependencies. In Proceedings of the 15th International Workshop on Treebanks and Linguistic Theories (TLT15), CEUR Workshop Proceedings, pages 63–75.
Marie-Catherine de Marneffe, Christopher D Manning, Joakim Nivre, and Daniel Zeman. 2021. Universal dependencies. Computational linguistics, 47(2):255–308.
Felice Dell’Orletta, Giulia Venturi, and Simonetta Montemagni. 2013. “Linguistically-Driven Selection of Correct Arcs for Dependency Parsing.” Computación Y Sistemas 17 (2): 125–36.
Vera Demberg and Frank Keller. 2008. “Data from Eye-Tracking Corpora as Evidence for Theories of Syntactic Processing Complexity.” Cognition 109 (2): 193–210.
10.1016/j.cognition.2008.07.008 :Matthew S.Dryer and Martin Haspelmath, eds. 2013. WALS Online. Leipzig: Max Planck Institute for Evolutionary Anthropology.
Richard Futrell, Kyle Mahowald, and Edward Gibson. 2015. “Large-Scale Evidence of Dependency Length Minimization in 37 Languages.” Proceedings of the National Academy of Sciences 112 (33): 10336–41.
Michael A. Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, and Dietrich Klakow. 2021. “A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios.” In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2545–68. Online: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/2021.naacl-main.201.
10.18653/v1/2021.naacl-main.201 :Jingyang Jiang and Haitao Liu. 2018. Quantitative Analysis of Dependency Structures. Vol. 72. Walter de Gruyter GmbH & Co KG.
Ryan McDonald and Joakim Nivre. 2011. Analyzing and integrating dependency parsers. Computational Linguistics, 37(1):197–230.
10.1162/coli_a_00039 :Marie-Catherine de Marneffe, Christopher D Manning, Joakim Nivre, and Daniel Zeman. 2021. “Universal Dependencies.” Computational Linguistics 47 (2): 255–308.
Joakim Nivre. 2015. Towards a universal grammar for natural language processing. In International conference on intelligent text processing and computational linguistics, pages 3–16. Springer.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “Bleu: A Method for Automatic Evaluation of Machine Translation.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–18.
Milan Straka, Jan Hajic, and Jana Straková. 2016. “UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, Pos Tagging and Parsing.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (Lrec 2016), 4290–7.
Hamed Taherdoost. 2016. “Sampling Methods in Research Methodology; How to Choose a Sampling Technique for Research.” How to Choose a Sampling Technique for Research (April 10, 2016).
David Temperley. 2007. “Minimization of Dependency Length in Written English.” Cognition 105 (2): 300–333.
Clara Vania, Yova Kementchedjhieva, Anders Søgaard, and Adam Lopez. 2019. “A Systematic Comparison of Methods for Low-Resource Dependency Parsing on Genuinely Low-Resource Languages.” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (Emnlp-Ijcnlp), 1105–16.
Ekaterina Vylomova, Edoardo M Ponti, Eitan Grossman, Arya D McCarthy, Yevgeni Berzak, Haim Du-bossarsky, Ivan Vuli´c, Roi Reichart, Anna Korho-nen, and Ryan Cotterell. 2020. Proceedings of the Second Workshop on Computational Research in Linguistic Typology. In Proceedings of the Second Workshop on Computational Research in Linguistic Typology.
Himanshu Yadav, Ashwini Vaidya, Vishakha Shukla, and Samar Husain. 2020. Word Order Typology Interacts With Linguistic Complexity: A Cross-Linguistic Corpus Study. Cognitive science, 44(4):e12822.
Xiang Yu, Agnieszka Falenska, and Jonas Kuhn. 2019. “Dependency Length Minimization Vs. Word Order Constraints: An Empirical Study on 55 Treebanks.” In Proceedings of the First Workshop on Quantitative Syntax (Quasy, Syntaxfest 2019), 89–97.
Daniel Zeman, Jan Hajič, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, and Slav Petrov. 2018. “CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies.” In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 1–21. Brussels, Belgium: Association for Computational Linguistics. https://0-doi-org.catalogue.libraries.london.ac.uk/10.18653/v1/K18-2001.
10.18653/v1/K18-2001 :Daniel Zeman, Martin Popel, Milan Straka, Jan Hajic, Joakim Nivre, Filip Ginter, Juhani Luotolahti, et al. 2017. “CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies.” In Proceedings of the Conll 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 1–19. Vancouver, Canada: Association for Computational Linguistics. http://www.aclweb.org/anthology/K/K17/K17-3001.pdf.
Notes de bas de page
1 https://universaldependencies.org
2 An assumption when producing automatically parsed data is that most of the errors made by a parser are consistent. As we showed in (Alzetta et al. 2017), the LISCA-based method allows to spot these errors types in annotations.
3 Given a deprel , we refer to as the dependency, with nsubj as the dependency label.
4 The language family and genus, reported between parenthesis as (ISO language code, family, genus), are acquired from the World Atlas of Language Structures (WALS, available online https://wals.info/languoid) (Dryer and Haspelmath 2013).
5 All the other languages are S(ubject)V(erb)O(bject) languages.
Auteurs
Twilio, Prague, Czechia – aaggarwal@twilio.com
Istituto di Linguistica Computazionale “A.Zampolli”, CNR, Pisa - ItaliaNLP Lab – chiara.alzetta@ilc.cnr.it
Le texte seul est utilisable sous licence Creative Commons - Attribution - Pas d'Utilisation Commerciale - Pas de Modification 4.0 International - CC BY-NC-ND 4.0. Les autres éléments (illustrations, fichiers annexes importés) sont « Tous droits réservés », sauf mention contraire.
Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015
3-4 December 2015, Trento
Cristina Bosco, Sara Tonelli et Fabio Massimo Zanzotto (dir.)
2015
Proceedings of the Third Italian Conference on Computational Linguistics CLiC-it 2016
5-6 December 2016, Napoli
Anna Corazza, Simonetta Montemagni et Giovanni Semeraro (dir.)
2016
EVALITA. Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 7 December 2016, Naples
Pierpaolo Basile, Franco Cutugno, Malvina Nissim et al. (dir.)
2016
Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017
11-12 December 2017, Rome
Roberto Basili, Malvina Nissim et Giorgio Satta (dir.)
2017
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
10-12 December 2018, Torino
Elena Cabrio, Alessandro Mazzei et Fabio Tamburini (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian
Proceedings of the Final Workshop 12-13 December 2018, Naples
Tommaso Caselli, Nicole Novielli, Viviana Patti et al. (dir.)
2018
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian Final Workshop
Valerio Basile, Danilo Croce, Maria Maro et al. (dir.)
2020
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
Bologna, Italy, March 1-3, 2021
Felice Dell'Orletta, Johanna Monti et Fabio Tamburini (dir.)
2020
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
Milan, Italy, 26-28 January, 2022
Elisabetta Fersini, Marco Passarotti et Viviana Patti (dir.)
2022