Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the koko-analytics domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /hum/web/sitestest.hum.uu.nl/htdocs/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the formidable domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /hum/web/sitestest.hum.uu.nl/htdocs/wp-includes/functions.php on line 6114
Datahub SSH | Corpora

Datahub SSH

Corpora

Under corpora we group all the activities that deal with the creation, deployment, and dissemination of text corpora. 

Current activities and results:

  • English-language historical newspapers
    In this project we focus on the dissemination of three corpora of English-language historical journals and magazines, like the Herald Tribune and the Economist.
  • A Corpus of Islamic legal texts (8th century – 19th century) 
    In this project fifty-five works of substantial Islamic law were prepared for analysis by the tool Footprinter.
  • Encyclopedia of Arabic poetry and belles-lettres (9th – 18th century) 
    This encyclopedia is a collection of fourteen encyclopedic anthologies of poetry and belles-lettres, all written, from the 9th to the 18th century, in the Sunni world. The anthologies will be subjected to a sentiment analysis, specifically targeting the diachronic appreciation of the five bodily senses. 
  • AnnCor and Multiword Expression Identifier
    The central goal of this project is to create a Multiword Expression Identifier for Dutch (MWEIDD) and enrich various Dutch text corpora with annotations based on this Identifier.
     
    Besides the activities above, the project consists of activities for preparing text corpora (AnnCor and Childes) for this MWEIDD algorithm.