An annotated lexicon of Italian derivatives


From these pages you can query or download derIvaTario, an annotated lexicon of about 11,000 Italian derivatives.

derIvaTario is based upon CoLFIS, a 3 million token corpus established in the nineties with the specific aim of representing the written language perceived by the average Italian reader. See the website for more information.

derIvaTario was created by manually segmenting into derivational cycles each of the 11,000 derivatives and annotating them with a wide array of features: information on affix and base allomorphy, the nature of morphotactic encountering between base and affix, the morphosemantic transparency of base and affix. Follow the link below to read the online documentation.

By query derIvaTario through the interface, you can browse it interactively and combine its unique morphological features with quantitative information present in the original CoLFIS project and with phonological representation provided by the Phonitalia project. By downloading it as a CSV file, you can employ derIvaTario to automatically tag existent corpora with relevant morphological information; you can also use derIvaTario as a gold standard for morphologically-related NLP tasks.

Query the lexicon

Download the lexicon in comma-separated value format

Download the lexicon as a SQL dump (contains part of itforms 1.10)

Read the online documentation (in Italian)

Handcrafted by Luigi Talamo (luigi_DOT_talamo_AT_uni-saarland_DOT_de) with Vim on different unix flavours