Corpus-Based Methods, 7.5 ECTS

Second level

Description

The course deals with corpus-based methods, that is, the large-scale study of written text, or spoken or signed utterances. Contents: Data, methods and evidence in different linguistic traditions. Quantitative properties of language, frequencies, ...

The course deals with corpus-based methods, that is, the large-scale study of written text, or spoken or signed utterances. Contents: Data, methods and evidence in different linguistic traditions. Quantitative properties of language, frequencies, n-grams. Data collection for different types of corpora (including traditional sample corpora, monitor corpora and web corpora) and modalities (text, speech, signing). Representation of corpora in XML. Overview of computational linguistic methods for automatic segmentation and annotation of text, including tokenisation, part-of-speech tagging and syntactic analysis. Searching corpora using regular expressions. Analysis of corpora based on occurrences and co-occurrences. Relationship between corpus material and research questions. Ethics, copyright, licenses.

Show entire description

Area of interests: Language and Linguistics

Languages open doors to other cultures, experiences, business contacts and collaboration between countries. At Stockholm University you can study nearly 30 different languages. You can also delve into more theoretical subjects such as Linguistics …

Languages open doors to other cultures, experiences, business contacts and collaboration between countries. At Stockholm University you can study nearly 30 different languages. You can also delve into more theoretical subjects such as Linguistics and Bilingualism. Language and Linguistics studies can lead to a large variety of professions within teaching, research and industry, the public sector, trade and tourism, and other areas.

More about Language and Linguistics

Subject

Linguistics

Linguistics