Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content
Documentation of the two files containing the database
Database with nouns in the plural
Статья посвящена описанию онлайн-ресурса CoCoCo (cococo.cosyco.ru), который позволяет получать информацию о лексической и грамматической сочетаемости слов. Педагогическая цель справочника состоит в том, чтобы дать быстрые ответы на самые... more
Статья посвящена описанию онлайн-ресурса CoCoCo (cococo.cosyco.ru), который позволяет получать информацию о лексической и грамматической сочетаемости слов. Педагогическая цель справочника состоит в том, чтобы дать быстрые ответы на самые сложные вопросы при изучении РКИ: как строить речь не только по правилам, но и идиоматично. Для этого используются сложные статистические расчеты, основанные на большом материале трех корпусов русского языка. The article is devoted to the description of the online resource CoCoCo (cococo.cosyco.ru), which allows a user to receive information about both lexical and grammatical co-occurrences of words. The goal of the resource is to provide quick answers to the most difficult questions in learning Russian: how to produce speech not just by rules, but also idiomatic. To achieve this, sophisticated statistical calculations are applied, based on a large language data extracted from three Russian corpora.
This dataset concerns the Russian construction where the same noun occurs in the nominative and instrumental cases, such as durak durakom ‘ultimate fool’. The dataset contains examples excerpted from the Russian National Corpus... more
This dataset concerns the Russian construction where the same noun occurs in the nominative and instrumental cases, such as durak durakom ‘ultimate fool’. The dataset contains examples excerpted from the Russian National Corpus (ruscorpora.ru).
The book consists of an Introduction and four articles published both in Finland and abroad, written in English or Russian. They present the studies of eight Finnish and Russian idiomatic constructions that appear in the following... more
The book consists of an Introduction and four articles published both in Finland and abroad, written in English or Russian. They present the studies of eight Finnish and Russian idiomatic constructions that appear in the following examples: Ikkuna rikki — Окно сломано, lit.: ‘the window broken’, Äiti täällä — Мама здесь, lit.: ‘mother here’, Kaikki myymälöihin! — Все в магазин, lit.: ‘all to the shops’, Пить так пить! ≈ ‘When I drink, I drink (a lot)!’, etc. The aim of the studies is to reconstruct the origins and to trace the development of the above-mentioned constructions up to their modern usages. To this end, the constructions are investigated both from historical and from comparative perspectives. Finally, the case studies provide a possibility to develop more general bases of development of these 'ungrammatical' items. By attempting to answer the question why such constructions develop even though they destroy the harmonious structure of а language, some principles of...
This book was inadvertently published with an incorrect affiliation for the Author Mikhail Kopotev. The affiliation has now been amended in the book.
The concept of linguistic complexity, understood broadly as a range of basic and elaborate structures available and accessible to learners as evidenced in their production of speech and writing (Ortega, 2003), has featured prominently in... more
The concept of linguistic complexity, understood broadly as a range of basic and elaborate structures available and accessible to learners as evidenced in their production of speech and writing (Ortega, 2003), has featured prominently in second language development research since the inception of the field. The field of heritage language acquisition, however, has only recently begun to engage linguistic complexity as a comprehensive lens for studying heritage language development. The current study contributes to this fledgling area of research by investigating automatically extracted measures of syntactic complexity in the written language of heritage learners of Russian at various developmental levels. The analysis of 12 measures of syntactic complexity allows us to conclude that the majority of automatically extracted indices differentiate proficiency levels of heritage speakers in the study. The study results provide important insights into the nature of heritage language develo...
The MULTEXT-East morphosyntactic lexicons have a simple structure, where each line is a lexical entry with three tab-separated fields: (1) the word-form, the inflected form of the word; (2) the lemma, the base-form of the word; (3) the... more
The MULTEXT-East morphosyntactic lexicons have a simple structure, where each line is a lexical entry with three tab-separated fields: (1) the word-form, the inflected form of the word; (2) the lemma, the base-form of the word; (3) the MSD, the morphosyntactic description of the word-form, i.e., its fine-grained PoS tag, as defined in the MULTEXT-East morphosyntactic specifications. This submission contains the non-commercial MULTEXT-East lexicons, while a separate submission (http://hdl.handle.net/11356/1041) gives those that are freely available.
In the CoCoCo project we develop methods to extract multi-word expressions of various kinds—idioms, multi-word lex-emes, collocations, and colligations—and to evaluate their linguistic stability in a common, uniform fashion. In this paper... more
In the CoCoCo project we develop methods to extract multi-word expressions of various kinds—idioms, multi-word lex-emes, collocations, and colligations—and to evaluate their linguistic stability in a common, uniform fashion. In this paper we introduce a Web interface, which provides the user with access to these measures , to query Russian-language corpora. Potential users of these tools include language learners, teachers, and linguists.
The “digital” is profoundly changing Russia today. In this introduction, we argue that area studies, as a geographically and geopolitically motivated interdisciplinary research domain, is of particular value to and can provide a framework... more
The “digital” is profoundly changing Russia today. In this introduction, we argue that area studies, as a geographically and geopolitically motivated interdisciplinary research domain, is of particular value to and can provide a framework for describing the variety of responses to digitalization and explaining the mechanisms that assist or obstruct the “domestication” of global trends. Making a case for “Digital Russia Studies”, we sketch the contours of this emerging field. “Digital Russia” studies focuses on the digital transformation of the (geographical) area of study, while digital “Russia Studies” indicates the use of digital sources and methods in studying it. Together, Digital Russia Studies emphasizes how these two research lines are intertwined, interdependent, and mutually reinforcing. An overview of topics and methods covered by the chapters in the volume is provided.
In the CoCoCo project we develop methods to extract multi-word expressions of various kinds—idioms, multi-word lexemes, collocations, and colligations—and to evaluate their linguistic stability in a common, uniform fashion. In this paper... more
In the CoCoCo project we develop methods to extract multi-word expressions of various kinds—idioms, multi-word lexemes, collocations, and colligations—and to evaluate their linguistic stability in a common, uniform fashion. In this paper we introduce a Web interface, which provides the user with access to these measures, to query Russian-language corpora. Potential users of these tools include language learners, teachers, and linguists.
We present our experience in applying distributional semantics (neural word embeddings) to the problem of representing and clustering documents in a bilingual comparable corpus. Our data is a collection of Russian and Ukrainian academic... more
We present our experience in applying distributional semantics (neural word embeddings) to the problem of representing and clustering documents in a bilingual comparable corpus. Our data is a collection of Russian and Ukrainian academic texts, for which topics are their academic fields. In order to build language-independent semantic representations of these documents, we train neural distributional models on monolingual corpora and learn the optimal linear transformation of vectors from one language to another. The resulting vectors are then used to produce `semantic fingerprints' of documents, serving as input to a clustering algorithm. The presented method is compared to several baselines including `orthographic translation' with Levenshtein edit distance and outperforms them by a large margin. We also show that language-independent `semantic fingerprints' are superior to multi-lingual clustering algorithms proposed in the previous work, at the same time requiring les...
This chapter opens with a discussion about what a corpus is and proceeds with an introduction of the main types of textual resources: the Web as a corpus, electronic libraries, and linguistic corpora. Among the last-mentioned, two are of... more
This chapter opens with a discussion about what a corpus is and proceeds with an introduction of the main types of textual resources: the Web as a corpus, electronic libraries, and linguistic corpora. Among the last-mentioned, two are of particular interest. The first is the Russian National Corpus, a deeply annotated and well-designed Russian-language resource ranging from early Old Russian chronicles up to modern internet communication. The second is Integrum, the largest resource by some margin, which covers most of the newspapers and journals published both abroad and domestically, as well as a significant amount of TV, radio, and internet media. The chapter includes two case studies that demonstrate how word choice reflects shared memory patterns and how changes in language usage reflect political and social mutations.
On the basis of data from the Russian National Corpus, we analyze the meanings and structure of the syntactic construction observed in phrases likedurak durakom‘fool-nom.sgfool-ins.sg’, which we term the ‘NOM∼INS construction’. We argue... more
On the basis of data from the Russian National Corpus, we analyze the meanings and structure of the syntactic construction observed in phrases likedurak durakom‘fool-nom.sgfool-ins.sg’, which we term the ‘NOM∼INS construction’. We argue that the construction constitutes a network of three closely related subcategories, which we refer to as ‘Extreme’, ‘Paragon’ and ‘Discourse Change’. It is furthermore shown that a diachronic change has taken place, whereby Discourse Change has overtaken Extreme as the dominant subcategory. Our analysis provides ample evidence for the main tenets of Construction Grammar, namely that a language is a network of related constructions, that meaning is often not compositional, that there is a continuum from idiomatic to schematic uses of constructions, and that constructions evolve over time.
This paper presents an exploratory study on the use of frequency-based probabilistic word combinations in Heritage Russian. The data used in the study are drawn from three small corpora of narratives, representing the language of Russian... more
This paper presents an exploratory study on the use of frequency-based probabilistic word combinations in Heritage Russian. The data used in the study are drawn from three small corpora of narratives, representing the language of Russian heritage speakers from three different dominant-language backgrounds, namely German, Finnish, and American English. The elicited narratives are based on video clips that the participants saw before the recording. Since the current study is based on a relatively small corpus, we conducted a manual corpus-based analysis of the heritage corpora and an automated analysis of the baseline (monolingual) corpus to investigate the differences between the heritage and monolingual language varieties. We hypothesize that heritage speakers deploy fewer probabilistic strategies in language production compared with native speakers and that their active knowledge of and access to ready-to-use multiword units are restricted compared with native speakers. When they c...
There are three types of Russian verbless clauses, which emerged through the ellipsis of the copula and other (full) verbs. This paper provides arguments against the hypothesis that they owe their existence to contact with Uralic... more
There are three types of Russian verbless clauses, which emerged through the ellipsis of the copula and other (full) verbs. This paper provides arguments against the hypothesis that they owe their existence to contact with Uralic languages. It argues that Finnic verbless clauses developed in parallel or even later than their Russian counterparts, and that the verbless clauses in Samoyedic languages, which preserve ancient Proto-Uralic features and use predicate nominal suffixes, differ structurally too much from those in Russian to represent likely models. It is argued that verbless clauses can naturally emerge when the meaning expressed by a frequent and semantically bleached verb is also included in the meaning of the phrase dependent on it. Other factors (contact-induced change, pragmatic and contextual factors) can support the emergence of – usually highly idiomatic – verbless clause constructions.

And 82 more

Research Interests:
This open access handbook presents a multidisciplinary and multifaceted perspective on how the ‘digital’ is simultaneously changing Russia and the research methods scholars use to study Russia. It provides a critical update on how Russian... more
This open access handbook presents a multidisciplinary and multifaceted perspective on how the ‘digital’ is simultaneously changing Russia and the research methods scholars use to study Russia. It provides a critical update on how Russian society, politics, economy, and culture are reconfigured in the context of ubiquitous connectivity and accounts for the political and societal responses to digitalization. In addition, it answers practical and methodological questions in handling Russian data and a wide array of digital methods. The volume makes a timely intervention in our understanding of the changing field of Russian Studies and is an essential guide for scholars, advanced undergraduate and graduate students studying Russia today.