Statistics

Size of the Corpus of Written Tatar: over 116 mln. tokens.
Amount of sentences in the database is more than 10 mln.

Lemmas:
The frequency list of Tatar language lemmas (Noun) (txt).
The frequency list of Tatar language lemmas (Proper noun) (txt).
The frequency list of Tatar language lemmas (Verb) (txt).
The frequency list of Tatar language lemmas (Auxiliary verb) (txt).
The frequency list of Tatar language lemmas (Adjective) (txt).
The frequency list of Tatar language lemmas (Adverb) (txt).
The frequency list of Tatar language lemmas (Pronoun) (txt).
The frequency list of Tatar language lemmas (Postposition) (txt).
The frequency list of Tatar language lemmas (Postadverb) (txt).
The frequency list of Tatar language lemmas (Numeral) (txt).
The frequency list of Tatar language lemmas (Adverbial conjunction) (txt).
The frequency list of Tatar language lemmas (Coordinating conjunction) (txt).
The frequency list of Tatar language lemmas (Subordinating conjunction) (txt).
The frequency list of Tatar language lemmas (Interjection) (txt).
The frequency list of Tatar language lemmas (Copula) (txt).
The frequency list of Tatar language lemmas (Determiner) (txt).
The frequency list of Tatar language lemmas (Ideophone) (txt).
The frequency list of Tatar language lemmas (Abbreviation) (txt).
The frequency list of Tatar language lemmas (Words not recognized by the version of Apertium's morphological analyzer we used) (txt).

Words:
The most frequent 5000 wordforms of Tatar language (txt).
The most frequent 200 "2-grams" (wordforms) of Tatar language.
The most frequent 200 "3-grams" (wordforms) of Tatar language.
The most frequent 200 "4-grams" (wordforms) of Tatar language.
The most frequent 200 "5-grams" (wordforms) of Tatar language.
The most frequent 200 "6-grams" (wordforms) of Tatar language.

Letters:
Frequency list of letters in Tatar language.
The most frequent 200 "2-grams" (letters) of Tatar language.
The most frequent 200 "3-grams" (letters) of Tatar language.
The most frequent 200 "4-grams" (letters) of Tatar language.
The most frequent 200 "5-grams" (letters) of Tatar language.
The most frequent 200 "6-grams" (letters) of Tatar language.

Letters (at the beginnig of a word):
The most frequent 200 "2-grams" (letters, beginnig of a word) of Tatar language.
The most frequent 200 "3-grams" (letters, beginnig of a word) of Tatar language.
The most frequent 200 "4-grams" (letters, beginnig of a word) of Tatar language.
The most frequent 200 "5-grams" (letters, beginnig of a word) of Tatar language.
The most frequent 200 "6-grams" (letters, beginnig of a word) of Tatar language.

Letters (at the end of a word):
The most frequent 200 "2-grams" (letters, end of a word) of Tatar language.
The most frequent 200 "3-grams" (letters, end of a word) of Tatar language.
The most frequent 200 "4-grams" (letters, end of a word) of Tatar language.
The most frequent 200 "5-grams" (letters, end of a word) of Tatar language.
The most frequent 200 "6-grams" (letters, end of a word) of Tatar language.

Phonemes (within a rhythmic group):
Frequency list of phonemes in Tatar language.
The most frequent 100 "2-grams" (phonemes) of Tatar language.
The most frequent 100 "3-grams" (phonemes) of Tatar language.
The most frequent 100 "4-grams" (phonemes) of Tatar language.
The most frequent 100 "5-grams" (phonemes) of Tatar language.
The most frequent 100 "6-grams" (phonemes) of Tatar language.

Phonemes (within a word):
The most frequent 100 "2-grams" (phonemes) of Tatar language.
The most frequent 100 "3-grams" (phonemes) of Tatar language.
The most frequent 100 "4-grams" (phonemes) of Tatar language.
The most frequent 100 "5-grams" (phonemes) of Tatar language.
The most frequent 100 "6-grams" (phonemes) of Tatar language.

Miscellaneous:
Frequency list of grammatical forms (txt). The description of tags system of Apertium project.
Frequency list of grammatical tags. The description of tags system of Apertium project.