Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content
This paper describes a pilot analysis of gender differences in the revised transcripts of speeches from the sittings in the Danish Parliament in the period from 2009 to 2017. Information about the number and duration of the speeches, the... more
This paper describes a pilot analysis of gender differences in the revised transcripts of speeches from the sittings in the Danish Parliament in the period from 2009 to 2017. Information about the number and duration of the speeches, the gender, age, party, and role in the party was automatically extracted from the transcripts and from other data on the Danish Parliament web site. The analysis shows statistically significant differences in the number and duration of the speeches by male and female MPs, and we also found differences in speech frequency with respect to the age of the MPs. Our analysis confirms previous studies on parliamentary data in other countries showing that the role of the MPs in their party influences their participation in the debates. Furthermore, we found that female ministers were speaking more in the period with a female prime minister than they did under a male prime minister. In the future, we will determine the statistical significance of the various pa...
Proceedings of the 3rd Nordic Symposium on Multimodal Communication. Editors: Patrizia Paggio, Elisabeth Ahlsén, Jens Allwood, Kristiina Jokinen, Costanza Navarretta. NEALT Proceedings Series, Vol. 15 (2011), vi+87 pp. © 2011 The editors... more
Proceedings of the 3rd Nordic Symposium on Multimodal Communication. Editors: Patrizia Paggio, Elisabeth Ahlsén, Jens Allwood, Kristiina Jokinen, Costanza Navarretta. NEALT Proceedings Series, Vol. 15 (2011), vi+87 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/22532
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked... more
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (from November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the linguistically marked-up version of the corpus, while the text version is available at http://hdl.handle.net/11356/1432. The ParlaMint.ana linguistic annotation includes tokenization, sentence segmentation, lemmatisation, Universal Dependencies part-of-speech, morphological features, and syntactic dependencies, and the 4-class CoNLL-2003 named entities. Some corpora also have further linguistic annotations, such as PoS tagging or named entities according to language-specific schemes, with their corpus TEI headers giving further details on the annotation vocabularies and tools. The compressed files include the ParlaMint.ana XML TEI-encoded linguistically annotated corpus; the derived corpus in CoNLL-U with TSV speech metadata; and the vertical files (with registry file), suitable for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText. Also included is the 2.1 release of the data and scripts available at the GitHub repository of the ParlaMint project. As opposed to the previous version 2.0, this version corrects some errors in various corpora and adds the information on upper / lower house for bicameral parliaments. The vertical files have also been changed to make them easier to use in the concordancers
The MUMIN multimodal coding scheme was created to experiment with annotation of multimodal communication in video clips of interviews taken from Swedish, Finnish and Danish television broadcasting. The coding scheme is also intended to be... more
The MUMIN multimodal coding scheme was created to experiment with annotation of multimodal communication in video clips of interviews taken from Swedish, Finnish and Danish television broadcasting. The coding scheme is also intended to be a general instrument for the study of gestures and facial displays in interpersonal communication, in particular the role played by multimodal expressions for feedback, turn management and sequencing.
This study concerns the use of speech pauses, and especially breath pauses in a Danish corpus of spontaneous dyadic conversations. Speech pauses which have specific communicative functions are investigated in relation to their occurrences... more
This study concerns the use of speech pauses, and especially breath pauses in a Danish corpus of spontaneous dyadic conversations. Speech pauses which have specific communicative functions are investigated in relation to their occurrences before and after other communicative units, all annotated and classified in the form of dialogue acts. Breath pauses have been addressed in only few studies even though they are important in communication and therefore should be accounted for when implementing human-machine dialogue systems. Dialogue acts, on the contrary, have been one of the backbones in dialogue systems since they generalize over different expressions of common communicative functions. In the current work, we describe the annotation of dialogue acts in the corpus and present an analysis of pauses using these annotations. To our best knowledge, dialogue acts have not been previously used for analyzing the functions of breath pauses. Our analysis shows that the most common type of pause having a communicative function in the Danish conversations are breath pauses. Breath pauses in the corpus have different uses, one of these being that of delimiting speech segments which are left unfinished and are then abandoned by the speaker (retractions in dialogue acts terminology) and therefore perceivable breathing can be a useful feature for determining spoken segments which must not be included in the dialogue history in human-machine dialogue systems.
The paper describes the main characteristics of the scientific programme of the fourth conference of Digital Humanities in the Nordic Countries (DHN2019) that took place in Copenhagen in March 2019. DHN2019, as the preceding DHN... more
The paper describes the main characteristics of the scientific programme of the fourth conference of Digital Humanities in the Nordic Countries (DHN2019) that took place in Copenhagen in March 2019. DHN2019, as the preceding DHN conferences, aimed to connect researchers and practitioners addressing all topics that generally belong under the Digital Humanities field. The DHN conferences address in particular researchers from the Nordic countries, comprising the Baltic region, but are also open to researchers from all over the world. Thus, DHN2019 attracted participants from 27 countries. The call for papers of DHN2019 followed the strategy proposed by the organizers of the DHN2018, who attempted to encompass two conference traditions, one from the humanities accepting abstracts as submissions and one from computer science accepting full papers of varying length. The latter type of submission was the most popular in 2019 and the present proceedings collect these papers. With respect t...
This paper investigates the relation between the form and function of hand gestures in audio and video recordings of American and English political iscourse of different type. Gestures have an important function in face-to-face... more
This paper investigates the relation between the form and function of hand gestures in audio and video recordings of American and English political iscourse of different type. Gestures have an important function in face-to-face communication contributing to the successful delivery of the message by reinforcing what is expressed by speech or by adding new information to what is uttered. The relation between form and function of gestures has been described by some of the pioneers of gestural studies. However, since gestures are multifunctional and they must be interpreted in context, it is important to investigate to what extent the form of gestures can be used to interpret their function automatically. Individuating the relation between form and function of gestures is also important for generating appropriate gestures in various communicative situations and this knowledge is vital for the integration of machine-human communicative and cognitive functions. In this paper we show that ...
This paper addresses the semi-automatic subject area annotation of the Danish Parliament Corpus 2009-2017 in order to construct a gold standard corpus for automatic classification. The corpus consists of the transcriptions of the speeches... more
This paper addresses the semi-automatic subject area annotation of the Danish Parliament Corpus 2009-2017 in order to construct a gold standard corpus for automatic classification. The corpus consists of the transcriptions of the speeches in the Danish parliamentary meetings. In our annotation work, we mainly use subject categories proposed by Danish scholars in political sciences. The relevant subjects areas of the speeches have been manually annotated using the titles of the agendas items for the parliamentary meetings and then the subjects areas have been assigned to the corresponding speeches. Some subjects cooccur in the agendas, since they are often debated at the same time. The fact that the same speech can belong to more subject areas is further analysed. Currently, more than 29,000 speeches have been classified using the titles of the agenda items. Different evaluation strategies have been applied. We also describe automatic classification experiments on a subset of the cor...
This paper addresses differences in the word use of two left-winged and two right-winged Danish parties, and how these differences reflecting some of the basic stances of the parties can be used to automatically identify the party of... more
This paper addresses differences in the word use of two left-winged and two right-winged Danish parties, and how these differences reflecting some of the basic stances of the parties can be used to automatically identify the party of politicians from their speeches. In the first study, the most frequent and characteristic lemmas in the manifestos of the political parties are analysed. The analysis shows that the most frequently occurring lemmas in the manifestos reflect either the ideology or the position of the parties towards specific subjects, confirming for Danish preceding studies of English and German manifestos. Successively, we scaled our analysis applying machine learning on different language models built on the transcribed speeches by members of the same parties in the Parliament (Hansards) in order to determine to what extent it is possible to predict the party of the politicians from the speeches. The speeches used are a subset of the Danish Parliament corpus 2009–2017....
This paper presents an investigation of mirroring facial expressions and the emotions which they convey in dyadic naturally occurring first encounters. Mirroring facial expressions are a common phenomenon in face-to-face interactions, and... more
This paper presents an investigation of mirroring facial expressions and the emotions which they convey in dyadic naturally occurring first encounters. Mirroring facial expressions are a common phenomenon in face-to-face interactions, and they are due to the mirror neuron system which has been found in both animals and humans. Researchers have proposed that the mirror neuron system is an important component behind many cognitive processes such as action learning and understanding the emotions of others. Preceding studies of the first encounters have shown that overlapping speech and overlapping facial expressions are very frequent. In this study, we want to determine whether the overlapping facial expressions are mirrored or are otherwise correlated in the encounters, and to what extent mirroring facial expressions convey the same emotion. The results of our study show that the majority of smiles and laughs, and one fifth of the occurrences of raised eyebrows are mirrored in the dat...
The paper is an investigation of the reusability of the annotations of head movements in a corpus in a language to predict the feedback functions of head movements in a comparable corpus in another language. The two corpora consist of... more
The paper is an investigation of the reusability of the annotations of head movements in a corpus in a language to predict the feedback functions of head movements in a comparable corpus in another language. The two corpora consist of naturally occurring triadic conversations in Danish and Polish, which were annotated according to the same scheme. The intersection of common annotation features was used in the experiments. A Naive Bayes classifier was trained on the annotations of a corpus and tested on the annotations of the other corpus. Training and test datasets were then reversed and the experiments repeated. The results show that the classifier identifies more feedback behaviours than the majority baseline in both cases and the improvements are significant. The performance of the classifier decreases significantly compared with the results obtained when training and test data belong to the same corpus. Annotating multimodal data is resource consuming, thus the results are promi...
I denne artikel analyserer vi sproglig og nonverbal feedback i to multimodale korpora som afspejler ret forskellige kommunikationssituationer. Det ene består af otte videooptagede såkaldte ’map task’-samtaler, hvor en af samtalepartnerne... more
I denne artikel analyserer vi sproglig og nonverbal feedback i to multimodale korpora som afspejler ret forskellige kommunikationssituationer. Det ene består af otte videooptagede såkaldte ’map task’-samtaler, hvor en af samtalepartnerne forklarer den anden ruten på et kort. De to deltagere sidder i forskellige rum uden at kunne se hinanden og kommunikerer igennem høretelefoner. Det andet korpus er en samling af tolv videooptagelser af frie ansigt-til-ansigt-samtaler hvor deltagerne, som ikke kendte hinanden på forhånd, blev optaget mens de stod over for hinanden i et studie og talte sammen. Sproget er dansk i begge korpora. Vi beskriver hvordan sproglig og nonverbal feedback er opmærket og gennemgår distributionen af forskellige typer feedbackudtryk i de to korpora. Vores analyse viser interessante forskelle i både mængden og typen (mono- kontra multimodal) af feedback, som dels skyldes de forskellige fysiske opsætninger, dels interaktionens natur, som er funktionel i ’map task’-di...
This paper deals with speech pauses marking clause boundaries and the gestures which co-occur with them in an audio- and video-recorded corpus of first encounters. The paper also investigates whether information about gestures... more
This paper deals with speech pauses marking clause boundaries and the gestures which co-occur with them in an audio- and video-recorded corpus of first encounters. The paper also investigates whether information about gestures co-occurring with speech contributes to the automatic prediction of the clause boundary pauses. Since one clause corresponds to one or more semantic units (dialog acts) the pauses investigated mark the start and end of large semantic units. The results of my study indicate that pauses that mark clause boundaries, co-occur with all gesture types in the analyzed corpus. Finally, my automatic prediction experiments show that information about speech tokens preceding the pauses can predict their function as clause boundary markers with high precision and recall, while information about the gestures co-occurring with speech (head movements, facial expressions, and body postures) does not contribute to the prediction.
... Costanza Navaretta, Lina Henriksen Center for Sprogteknologi, University of Copenhagen Njalsgade 80, 2300 Copenhagen S-DK {bolette, costanza, lina}@ cst. ... Masolo, CS Borgo, A. Gangemi, N. Guarino, A. Oltramari, L. Schneider. ...... more
... Costanza Navaretta, Lina Henriksen Center for Sprogteknologi, University of Copenhagen Njalsgade 80, 2300 Copenhagen S-DK {bolette, costanza, lina}@ cst. ... Masolo, CS Borgo, A. Gangemi, N. Guarino, A. Oltramari, L. Schneider. ... The Generative Lexicon, Cambridge, MA. ...
NEALT PROCEEDINGS SERIES VOL. 15 Proceedings of the 3rd Nordic Symposium on Multimodal Communication
to classify the feedback function of head
Humans communicate face-to-face through at least two modalities, the auditive modality, speech, and the visual modality, gestures, which comprise e.g. gaze movements, facial expressions, head movements, and hand gestures. The relation... more
Humans communicate face-to-face through at least two modalities, the auditive modality, speech, and the visual modality, gestures, which comprise e.g. gaze movements, facial expressions, head movements, and hand gestures. The relation between speech and gesture is complex and partly depends on factors such as the culture, the communicative situation, the interlocutors and their relation. Investigating these factors in real data is vital for studying multimodal communication and building models for implementing natural multimodal communicative interfaces able to interact naturally with individuals of different age, culture, and needs. In this paper, we discuss to what extent big data “in the wild”, which are growing explosively on the internet, are useful for this purpose also in light of legal aspects about the use of personal data, comprising multimodal data downloaded from social media.
Overlapping speech and gestures are common in face-to-face conversations and have been interpreted as a sign of synchronization between conversation participants. A number of gestures are even mirrored or mimicked. Therefore, we... more
Overlapping speech and gestures are common in face-to-face conversations and have been interpreted as a sign of synchronization between conversation participants. A number of gestures are even mirrored or mimicked. Therefore, we hypothesize that the gestures of a subject can contribute to the prediction of gestures of the same type of the other subject. In this work, we also want to determine whether the speech segments to which these gestures are related to contribute to the prediction. The results of our pilot experiments show that a Naive Bayes classifier trained on the duration and shape features of head movements and facial expressions contributes to the identification of the presence and shape of head movements and facial expressions respectively. Speech only contributes to prediction in the case of facial expressions. The obtained results show that the gestures of the interlocutors are one of the numerous factors to be accounted for when modeling gesture production in conversational interactions and this is relevant to the development of socio-cognitive ICT.
I forbindelse med Nordisk Ministerråds bevilling til at iværksætte et nordisk sprogteknologisk forskningsprogram blev det anført, at det var vigtigt, at det sprogteknologiske forskningsprogram præsenterede sine resultater og i øvrigt... more
I forbindelse med Nordisk Ministerråds bevilling til at iværksætte et nordisk sprogteknologisk forskningsprogram blev det anført, at det var vigtigt, at det sprogteknologiske forskningsprogram præsenterede sine resultater og i øvrigt gjorde opmærksom på sig selv som et nyttigt bidrag til det nordiske samarbejde både i professionelle miljøer og over for en bredere kreds af interesserede. Nærværende årbog omhandler sprogteknologiprogrammets aktiviteter i den sidste del af året 2004 og den første del af 2005; den er et forsøg på at ...
This paper is about the relation between pronominal types, syntactic types of the antecedent, semantic type of the referent and anaphoric distance in the Danish part of the DAD corpus comprising written and spoken data. These aspects are... more
This paper is about the relation between pronominal types, syntactic types of the antecedent, semantic type of the referent and anaphoric distance in the Danish part of the DAD corpus comprising written and spoken data. These aspects are important to understand the use of abstract anaphora and to process them automatically and some of them have been investigated previously (see i.a.Webber (1988); Gundel et al. (2003); Navarretta (2010)). Differing from preceding studies, we extend the analysis of the syntactic types of the antecedent to include a fine-grained classification of clausal types and also investigate the anaphoric distance. The most common antecedent types in the data are subordinate clause and simple main clause and most abstract anaphora occurred in the clause which followed the antecedent or the clause in which the antecedent occurred. There is no clear dependence between the type of antecedent clause and the type of referent. 1
This paper describes how Human Language Technologies and linguistic resources are used to support the construction of components of a knowledge organisation system. In particular we focus on methodologies and resources for building a... more
This paper describes how Human Language Technologies and linguistic resources are used to support the construction of components of a knowledge organisation system. In particular we focus on methodologies and resources for building a corpus-based domain ontology and extracting relevant metadata information for text chunks from domain-specific corpora. 1.
Filled pauses, which are pauses accompanied by so-called fillers, are very frequent in spoken language. Fillers have multiple non-exclusive functions which are both related to the management of communication (Allwood et al., 1992; Maclay... more
Filled pauses, which are pauses accompanied by so-called fillers, are very frequent in spoken language. Fillers have multiple non-exclusive functions which are both related to the management of communication (Allwood et al., 1992; Maclay and Osgood, 1959; Duncan and Fiske, 1977) and cognitive processes of planning the discourse and retrieving words (Rochester, 1973; Krauss et al., 2000). Researchers have found that there is an inverse frequency relation between hand gestures and filled pauses (Christenfeld et al., 1991; Rauscher et al., 1996) and that many hold gestures co-occur with filled pauses (Esposito et al., 2001; McNeill, 2014). Fillers are an integral part of the language and have language specific characteristics (de Leeuw, 2007). Clark and FoxTree (2002) propose to consider fillers as words since they are used in different contexts. In this study we want to determine a) which are the most common fillers in a Danish corpus of first encounters, b) whether fillers co-occur w...
In this paper we show that some of the syntactic patterns in an NLP lexicon can be used to identify semantically ”similar” adjectives and verbs. We define semantic similarity on the basis of parameters used in the literature to classify... more
In this paper we show that some of the syntactic patterns in an NLP lexicon can be used to identify semantically ”similar” adjectives and verbs. We define semantic similarity on the basis of parameters used in the literature to classify adjectives and verbs semantically. The semantic clusters obtained from the syntactic encodings in the lexicon are evaluated by comparing them with semantic groups in existing tax­ onomies. The relation between adjectival syntactic patterns and their meaning is particularly interesting, because it has not been explored in the literature as much as it is the case for the relation between verbal complements and tu-guments. The identification of semantic groups on the basis of the syntactic encodings in the con­ sidered NLP lexicon can also be extended to other word classes and, maybe, to other languages for which the same type of lexicon exists. 1 In tr o d u c tio n The idea that the syntactic behaviour of words is connected with their meaning has been...
This paper describes a pilot study act to investigate the semiotic types of hand gestures in video-recorded speeches and their automatic classification. Gestures, which also comprise e.g. head movements and body posture, contribute to the... more
This paper describes a pilot study act to investigate the semiotic types of hand gestures in video-recorded speeches and their automatic classification. Gestures, which also comprise e.g. head movements and body posture, contribute to the successful delivery of the message by reinforcing what is expressed by speech or by adding new information to what is uttered. The automatic classification of the semiotic type of gestures from their shape description can contribute to their interpretation in human-human communication and in advanced multimodal interactive systems. We annotated and analysed hand gestures produced by Barack Obama during two speeches at the Annual White House Correspondent Dinners and found differences in the contexts in which various hand gesture types were used. Then, we trained machine learning algorithms to classify the semiotic type of the hand gestures. The F-score obtained by the best performing algorithm on the classification of four semiotic types is 0.59. S...
This paper presents a pilot comparative study of feedback head movements and vocal expressions in Danish and Polish naturally occurring video recorded conversations. The conversations involve three well-acquainted participants who talk... more
This paper presents a pilot comparative study of feedback head movements and vocal expressions in Danish and Polish naturally occurring video recorded conversations. The conversations involve three well-acquainted participants who talk freely in their homes while eating and drinking. The analysis of the data indicates that the participants use the same types of head and spoken feedback in the two languages. However, the Polish participants express more often feedback multimodally, that is through the two modalities, and they use more repeated multimodal feedback expressions than the Danish participants. Moreover, we found a stronger relation between repeated head movements and repeated speech tokens in the Polish data than in the Danish one. Our data also confirms that there is a correlation between familiarity and feedback frequency and familiarity and repetitiveness of feedback expressions as suggested in preceding studies (Boholm and Allwood 2010, Navarretta and Paggio 2012).
In many approaches for resolving intersentential pronominal anaphora the degree of salience of entities determines their accessibility in the addressee’s discourse model. Degree of salience is identified by the degree of givenness of... more
In many approaches for resolving intersentential pronominal anaphora the degree of salience of entities determines their accessibility in the addressee’s discourse model. Degree of salience is identified by the degree of givenness of entities in the addressee’s cognitive model, so that given/known entities have highest degree of salience. The most given entities are chosen as antecedents of pronouns. à and Vrbovà (1982), who suggest that entities in the focal part of an utterance in Information Structure (IS) terms have highest degree of salience. These entities often correspond to new information. Analysing Danish discourse we have found that these apparently contrasting interpretations of salience are both valid, but in different contexts. We propose a model of accessibility where salience is by default connected with givenness, but where this default can be explicitly overruled in contexts where the speaker explicitly marks entities as salient by ISrelated devices. These explicit...
This paper deals with the uses of the annotations of third person singular neuter pronouns in the DAD parallel and comparable corpora of Danish and Italian texts and spoken data. The annotations contain information about the functions of... more
This paper deals with the uses of the annotations of third person singular neuter pronouns in the DAD parallel and comparable corpora of Danish and Italian texts and spoken data. The annotations contain information about the functions of these pronouns and their uses as abstract anaphora. Abstract anaphora have constructions such as verbal phrases, clauses and discourse segments as antecedents and refer to abstract objects comprising events, situations and propositions. The analysis of the annotated data shows the language specific characteristics of abstract anaphora in the two languages compared with the uses of abstract anaphora in English. Finally, the paper presents machine learning experiments run on the annotated data in order to identify the functions of third person singular neuter personal pronouns and neuter demonstrative pronouns. The results of these experiments vary from corpus to corpus. However, they are all comparable with the results obtained in similar tasks in ot...
In this paper, we aim to predict audience response from simple spoken sequences, speech pauses and co-speech gestures in annotated video-and audio-recorded speeches by Barack Obama at the Annual White House Correspondents' Association... more
In this paper, we aim to predict audience response from simple spoken sequences, speech pauses and co-speech gestures in annotated video-and audio-recorded speeches by Barack Obama at the Annual White House Correspondents' Association Dinner in 2011 and 2016. At these dinners, the American president mocks himself, his collaborators, political adversary and the press corps making the audience react with cheers, laughter and/or applause. The results of the prediction experiment demonstrate that information about spoken sequences, pauses and co-speech gestures by Obama can be used to predict the immediate audience response. This confirms and shows an application of numerous studies that address the importance of speech pauses and gestures in delivering the discourse message in a successful way. The fact that machine learning algorithms can use information about pauses and gestures to build models of audience reaction is also relevant for the construction of intelligent and cognitiv...
In this article, we compare feedback-related multimodal behaviours in two different types of interactions: first encounters between two participants who do not know each in advance, and naturally-occurring conversations between two and... more
In this article, we compare feedback-related multimodal behaviours in two different types of interactions: first encounters between two participants who do not know each in advance, and naturally-occurring conversations between two and three participants recorded at their homes. All participants are Danish native speakers. The interactions are transcribed using the same methodology, and the multimodal behaviours are annotated according to the same annotation scheme. In the study we focus on the most frequently occurring feedback expressions in the interactions and on feedback-related head movements and facial expressions. The analysis of the corpora, while confirming general facts about feedback-related head movements and facial expressions previously reported in the literature, also shows that the physical setting, the number of participants, the topics discussed, and the degree of familiarity influence the use of gesture types and the frequency of feedback related expressions and ...
Acknowledgements: We would like to thank Bente Maegaard, Anna Braasch and Henning Spang-Hanssen for fruitful and interesting discussions during the development of this manual, and for their comments on the text. In addition we would like... more
Acknowledgements: We would like to thank Bente Maegaard, Anna Braasch and Henning Spang-Hanssen for fruitful and interesting discussions during the development of this manual, and for their comments on the text. In addition we would like to thank the experts from ELRA Panel for Validation of Written Language Resources (EPV-WLR) who gave valuable feedback on earlier versions of this document.
The STO (SprogTeknologisk Ordbase) lexicon is a comprehensive computational lexicon of Danish developed for NLP/HLT applications. The morphological layer of the lexicon , presented here in csv format, contains a vocabulary of 88.067... more
The STO (SprogTeknologisk Ordbase) lexicon is a comprehensive computational lexicon of Danish developed for NLP/HLT applications. The morphological layer of the lexicon , presented here in csv format, contains a vocabulary of 88.067 entries. STO is created within the framework of a national collaborational project, initiated by Center for Language Technology (CST) in 2001-2004.
This paper presents an approach to automatic head movement detection and classification in data from a corpus of video-recorded face-to-face conversations in Danish involving 12 different speakers. A number of classifiers were trained... more
This paper presents an approach to automatic head movement detection and classification in data from a corpus of video-recorded face-to-face conversations in Danish involving 12 different speakers. A number of classifiers were trained with different combinations of visual, acoustic and word features and tested in a leave-one-out cross validation scenario. The visual movement features were extracted from the raw video data using OpenPose, and the acoustic ones using Praat. The best results were obtained by a Multilayer Perceptron classifier, which reached an average 0.68 F1 score across the 12 speakers for head movement detection, and 0.40 for head movement classification given four different classes. In both cases, the classifier outperformed a simple most frequent class baseline as well as a more advanced baseline only relying on velocity features.
The paper compares how feedback is expressed via speech and head movements in comparable corpora of first encounters in three Nordic languages: Danish, Finnish and Swedish. The three corpora have been collected following common... more
The paper compares how feedback is expressed via speech and head movements in comparable corpora of first encounters in three Nordic languages: Danish, Finnish and Swedish. The three corpora have been collected following common guidelines, and they have been annotated according to the same scheme in the NOMCO project. The results of the comparison show that in this data the most frequent feedback-related head movement is Nod in all three languages. Two types of Nods were distinguished in all corpora: Down-nods and Up-nods; the participants from the three countries use Down- and Up-nods with different frequency. In particular, Danes use Down-nods more frequently than Finns and Swedes, while Swedes use Up-nods more frequently than Finns and Danes. Finally, Finns use more often single Nods than repeated Nods, differing from the Swedish and Danish participants. The differences in the frequency of both Down-nods and Up-nods in the Danish, Finnish and Swedish interactions are interesting ...
espanolEn este articulo se describen los textos del italiano y danes comparables y anotados con cadenas de correferencia e informacion sobre los cambios de topico discursivo, asi como una evaluacion de dicha anotacion. Tambien se discuten... more
espanolEn este articulo se describen los textos del italiano y danes comparables y anotados con cadenas de correferencia e informacion sobre los cambios de topico discursivo, asi como una evaluacion de dicha anotacion. Tambien se discuten las diferencias generales en el modo de referir las expresiones en danes e italiano. Se presenta tambien el analisis de la relacion entre el uso de tipos de expresiones referidas y cambios de topico discursivo en parte de los datos utilizando el marco teorico del Centering. EnglishIn this paper we describe Danish and Italian parallel and comparable texts annotated with (co)referential chains and information about discourse topic shifts, and present an evaluation of the annotation. We also discuss general differences in the way referring expressions are used in Danish and Italian and present the analysis of the relation between the use of types of referring expression and discourse topic shifts in part of the data using the Centering framework.

And 83 more