The interdisciplinary project “Semantics-driven Syntactic Parser (for Romanian)” aims at using hybrid methods in language processing and at adopting an integrative perspective in the linguistic analysis, in order to create a tool for the automatic syntactic analysis of Romanian. Its added value will be the use of lexico-semantic resources that will improve the results of the deep syntactic analysis. No such hybrid operational parser is described in international research or development papers and, definitely, the Romanian language lacks any kind of wide-coverage parser.
The envisaged results of the project are: (i) an annotation manual for a dependency treebank, (ii) a reference treebank for Romanian, (iii) lexico-semantic resources for syntactic parsing (for Romanian, but transferrable to other languages, when defined at a conceptual level): (a) combinatorial restrictions between words and (semantic) word classes (defined on the Romanian WordNet) accompanied by a reliability score; (b) valency frames for verbs, which contain, for each sense of a verb, the number of arguments and the syntactic and semantic selectional restrictions on their lexicalization; (iv) the hybrid parser.
The project will have a major impact in machine translation, in Romanian corpus annotation, in the creation of strategic resources for Romanian, in the development of applications based on natural language technologies, both by the scientific community and by industry.
- a reference treebank for Romanian;
- valency dictionary;
- lexico-semantic resources for parsing;
- a parser for Romanian.
Already obtained results:
- the reference treebank of Romanian annotated with universal dependencies (RoRefTrees):
a valence dictionary for the verbs in the treebank;
word embeddings and clusters defined on CoRoLa and RoWordnet:
- downloadable from this link (the format is: cluster score TAB cluster elements separated by commas);
- a semantics-driven syntactic parser for Romanian.
- Verginica Barbu Mititelu, Radu Ion, Radu Simionescu, Elena Irimia, Cenel-Augusto Perez, The Romanian Treebank Annotated According to Universal Dependencies. In Proceedings of The Tenth International Conference on Natural Language Processing (HrTAL2016), Dubrovnik, Croatia, 29 September – 1 October 2016. (in press)
- Verginica Barbu Mititelu, Elena Irimia, Linguistic Data Retrievable from a Treebank. In Proceedings of the Second Conference on Computational Linguistics in Bulgaria (CLiB 2016), Sofia, Bulgaria, 9 September 2016, p.19-27. ISSN: 2367-5675.
- Verginica Barbu Mititelu, Elena Irimia, Description of the Romanian Syntax within Universal Dependency Project, in Proceedings of the 11th International Conference “Linguistic Resources and Tools for Processing the Romanian Language”, Iași, Noiembrie 2015, Editura Universității „Alexandru Ioan Cuza”, Iași, p. 185-194. ISSN 1843-911X
Elena Irimia, Verginica Barbu Mititelu, Two Resources Developed in the Project Semantics-driven Syntatic Parser for Romanian, in Maria Mitrofan, Daniela Gîfu, Dan Tufiș, Dan Cristea (eds.), Proceedings of the 12th International Conference “Linguistic Resources and Tools for Processing the Romanian Language”, Mălini, 27-29 october 2016, p. 69-78.
Paula Gradu, Radu Ion, SyntaxNet for Romanian: Results and Potential, in Maria Mitrofan, Daniela Gîfu, Dan Tufiș, Dan Cristea (eds.), Proceedings of the 12th International Conference “Linguistic Resources and Tools for Processing the Romanian Language”, Mălini, 27-29 october 2016, p. 61-68.
- Verginica Barbu Mititelu, Radu Ion, Radu Simionescu, Andrei Scutelnicu, Elena Irimia, Improving parsing using morpho-syntactic and semantic information, in Revista Romana de Interactiune Om-Calculator 9(4), 285-304, 2016.
- Verginica Barbu Mititelu, Pasivul românesc. Analiză cantitativă bazată pe un corpus adnotat morfo-sintactic, in Verginica Barbu Mititelu, Mihaela Ionescu, Gianina Iordăchioaia (eds.), Lingvistică generală, lingvistică formală, lingvistică computațională, Editura Universității din București, 2017, p. 219-233.
- Verginica Barbu Mititelu, Svetlozara Leseva, Dan Tufiș, The Bilateral Collaboration for the Post-BalkaNet Extension of the Bulgarian and the Romanian Wordnets, in International Annual Conference on the 75th Anniversary of the Institute for Bulgarian Language “Prof. Lyubomir Andreychin” (in press)