| Lecturer : Karën Fort |
| Hours and credits : 10 Lecturing hours, 10 Tutorial hours, 2,5 ECTS |
| This course will be taught in : en |
The objective of this course, is to make the students aware of the key role played by annotated textual corpora in contemporary Natural Language Processing (NLP). The importance of methodology and evaluation will be highlighted.
Curriculum:
1) Corpus linguistics presentation:
Course 1: Introduction and History
Course 2: Corpora Characteristics and Most Well-Known Corpora
2) Human annotation:
Course 3: Practical Course, Transcribing with Transcriber (prereq.: install Transcriber)
Course 4: Practical Course, Annotating with GATE and Glozz
Course 5: Annotation: Introduction and Methodology
Course 6: Solutions for Annotation
Course 6/7: Practical Course, Crowdsourcing, using AMT and PhraseDetective
3) Evaluation:
Course 7: Principles and Inter-annotator Agreement
Course 8: Practical course: Computing the Inter-annotator Agreement in an Annotation Campaign
4) Presentations by the students:
Course 9/10: Presentations by the Students: detailled presentation on one corpus (if available) annotated in your own language (20 min. each).
Suggested reading:
Mc Enery T., Wilson A. (1998) Corpus linguistics. Edinburgh textbooks in Empirical linguistics.
Les linguistiques de corpus (2001). B. Daille, L. Romary (eds). Traitement Automatique des Langues (T.A.L.), n°42/2, Hermès, Paris.
Useful Websites:
More references on my Website.
http://bowland-files.lancs.ac.uk/monkey/ihe/linguistics/contents.htm
(site for the book by McEnery & Wilson, 1998)


