[ to main page ]

Description of the Corpus

The Corpus of Written Tatar is a collection of electronic texts in the Tatar language.

A majority of the texts included in the corpus of Tatar language pertains to three styles: journalism (≈ 60%), fiction (≈ 35%) and scientific literature in the field of humanities (≈ 5%).

The basic purpose of the Corpus of Written Tatar language is to provide assistance in research into the Tatar lexicon. Furthermore, the corpus can be used in language learning, and as a source of models for various types of documents.

The user interface of the Tatar language corpus makes it possible to perform the following operations:

The searches described above allow the following tasks to be accomplished:

The list of applications of the Corpus of Tatar language given above is, of course, not exhaustive. Electronic corpus materials are also indispensable in the work on automatic recognition of speech as well as machine translation.

Today the Tatar corpus has a balanced representativeness in relation to the language reality.

New contributions to the Corpus of Tatar are welcomed with gratitude. If you want to help us, please send electronic versions of your own books, articles and other documents to us for inclusion in the corpus.

In order to protect the copyrights of the authors, texts are stored in the corpus as individual sentences, which means that it is not possible to extract whole texts from the corpus. Each sentence is provided with a link to the literary work in question.