To understand the multiple dimensions of prediction of concepts in social and biomedical science questionnaires.
Summary of work undertaken
This work package extended the scope of the research tackled in the RCNIC project to:
- Dive deeper into questions related to the size and quality of the training data and how this affects the performance of the designed ML models.
- Assess the performance of the trained ML models for automated tagging of question texts with the top-level concept topics (16 in number) from existing thesauri such as European Language Social Science Thesaurus (ELSST) in ‘inference mode’, i.e. with new unseen questionnaires (that were not part of the training and validation set).
- Investigate new ML models (such as hierarchical approaches) for tagging question texts (and response domains) with the 120 second-level topics from ELSST.