[ to main page ] General Information

This website contains a Text Corpus of the modern Tatar language consisting of over 116 million word occurrences.
The corpus represents modern written Tatar language in electronic form.
The total count of different word forms in the Tatar corpus is about 1,5 mln.
This collection of Tatar texts in electronic form is intended for the use of those interested in the structure, present condition and prospects of the Tatar language.
The Corpus of Written Tatar language is indispensable for everyone who wants to study Tatar by the methods of corpus linguistics.

This project does not get financial support from any scientific fund or organization.
All work on the Corpus of Written Tatar is being done by the project participants in spare time.

Project news

27.02.2017 - The 5th version of fastmorph corpus search engine is released. Now it consumes about 2,5 times less RAM.

23.01.2017 - Spellchecker for Tatar language has been launched in Online SpellCheck section.

09.01.2017 - N-grams based search has been launched in Search the Corpus section. 1, 2, 3, 4, 5 and 6-grams are supported.

22.11.2016 - We opened the source code of the "fastmorph" corpus search engine under GNU General Public License v3.0 and placed it in the GitHub.

18.11.2016 - The 4th version of fastmorph corpus search engine is released. List of changes:

17.11.2016 - The Corpus is reannotated with the most recent version of Apertium morphological tagger.

12.10.2016 - Frequency lists of Tatar language lemmas are placed in the Statistics section.

19.07.2016 - Some improvements in the Complex morphological search engine "fastmorph":

01.07.2016 - User's Guides in Tatar, Russian and English languages are updated.

13.06.2016 - Search by the middle part of a word functionality was added in the fastmorph module. For example, if you type *әме*, words like ярдәмендә, бәйрәмен, үткәрәмен, өйдәме will be found...

21.04.2016 - Because of implementation in "fastmorph" module some processor optimizations and multithreading support we achieved that complex morphological search now performs up to five times faster.

03.04.2016 - Complex morphological search system's features were significantly extended. You can get more info about them in The Guides updated up to 3.0 and higher version.

29.03.2016 - Graphical mode of entering grammatical features in search query is created in the Complex morphological search section.

22.02.2016 - Complex morphological search function appeared in The Corpus of Written Tatar, where you can use different combinations of such parameters as wordform, lemma, grammatical tags, beginning and end of words, distances between them.

21.11.2015 - Finnish Tatars writing system support realized in "Tatar Text-To-Speech" synthesizer.

20.11.2015 - In the User's Guide section Manual in English language is available now.

06.10.2015 - User's Guide new section is created. Currently users there can download Russian version of a Manual to the Corpus of Written Tatar. English and Tatar versions of Guide will be available some later.

16.08.2015 - "Tatar Text-To-Speech" system is placed in the Corpus' site. It is being developed by the team of Written Corpus of Tatar. We invite all interested persons to take a part in this project!

11.06.2015 - For users without Tatar keyboard layout in their computers we added a virtual keyboard in the Search page of the Corpus. After launching it you can type clicking with mouse or just pressing buttons on your real keyboard.

18.04.2015 - Template (the end of the word) based search system in the Corpus is implemented.

29.03.2015 - Limit for viewing right, left and semantic contexts is increased from 100 to 10 000 units. To view them in a table format you should click on the link "Show all".

26.03.2015 - Now The Corpus is also available at the new address corpus.tatar. At the same time you can use the old address corpus.tatfolk.ru.

14.03.2015 - Template (the beginning of the word) based search system in the Corpus is implemented.

12.10.2014 - Implementation of a system for listening of visualized sentences (by clicking on the appropriate button on the left of sentences).

05.10.2014 - The morphological marking of the Corpus is made. The meta-language of grammatical labels is based on the system of "tags" for Turkic languages, developed by the international project Apertium.

14.08.2014 - New version of the Corpus is released:

16.03.2014 - Changes list:

24.03.2013 - There have been made many improvements:

15.03.2012 - The main work on creating The Corpus of Written Tatar language has been completed. The basic versions of the website and the search engine have been developed. Launching the service.