Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content

    Jordi Turmo

    Noun phrase (NP) coreference resolution is a problem involved in many Natural Language areas, such as Dialog, Information Extraction and Retrieval, Summarization and Question Answering, among others. Especially important issues regarding... more
    Noun phrase (NP) coreference resolution is a problem involved in many Natural Language areas, such as Dialog, Information Extraction and Retrieval, Summarization and Question Answering, among others. Especially important issues regarding this problem are the ...
    En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el espanol. Para ello realizamos una serie de experimentos con 20 sentidos verbales del corpus Sensem. Empleamos diferentes tipos de atributos... more
    En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el espanol. Para ello realizamos una serie de experimentos con 20 sentidos verbales del corpus Sensem. Empleamos diferentes tipos de atributos que abarcan informacion linguistica diversa y un metodo de clustering jerarquico aglomerativo para generar varias clasificaciones. Comparamos cada una de estas clasificaciones automaticas con un gold standard creado semi-automaticamente teniendo en cuenta construcciones linguisticas propuestas desde la linguistica teorica. Esta comparacion nos permite saber que atributos son mas adecuados para crear de forma automatica una clasificacion coherente con la teoria sobre construcciones y cuales son las similitudes y diferencias entre la clasificacion verbal automatica y la que se basa en la teoria sobre construcciones linguisticas.
    Abstract: The goal of the project is to explore integrated environments allowing the cost-effective deployment of vertical information access portals for specific domains. The project started in January 2010, and will last three years.... more
    Abstract: The goal of the project is to explore integrated environments allowing the cost-effective deployment of vertical information access portals for specific domains. The project started in January 2010, and will last three years. Keywords: Natural Language Processing, Syntactic Analysis, Semantic Interpretation, Knowledge Acquisition, Information Extraction, Information Retrieval
    Similarity, which plays a key role in fields like cognitive science, psycholinguistics and natural language processing, is a broad and multifaceted concept. In this work we analyse how two approaches that belong to different perspectives,... more
    Similarity, which plays a key role in fields like cognitive science, psycholinguistics and natural language processing, is a broad and multifaceted concept. In this work we analyse how two approaches that belong to different perspectives, the corpus view and the psycholinguistic view, articulate similarity between verb senses in Spanish. Specifically, we compare the similarity between verb senses based on their argument structure, which is captured through semantic roles, with their similarity defined by word associations. We address the question of whether verb argument structure, which reflects the expression of the events, and word associations, which are related to the speakers’ organization of the mental lexicon, shape similarity between verbs in a congruent manner, a topic which has not been explored previously. While we find significant correlations between verb sense similarities obtained from these two approaches, our findings also highlight some discrepancies between them and the importance of the degree of abstraction of the corpus annotation and psycholinguistic representations.
    This paper describes the experience of QAST 2008, the second time a pilot track of CLEF has been held aiming to evaluate the task of Question Answering in Speech Transcripts. Five sites submitted results for at least one of the five... more
    This paper describes the experience of QAST 2008, the second time a pilot track of CLEF has been held aiming to evaluate the task of Question Answering in Speech Transcripts. Five sites submitted results for at least one of the five scenarios (lectures in English, meetings in English, broadcast news in French and European Parliament debates in English and Spanish).
    En este trabajo nos centramos en la adquisición de clasificaciones verbales automáticas para el español. Para ello realizamos una serie de experimentos con 20 sentidos verbales del corpus Sensem. Empleamos diferentes tipos de atributos... more
    En este trabajo nos centramos en la adquisición de clasificaciones verbales automáticas para el español. Para ello realizamos una serie de experimentos con 20 sentidos verbales del corpus Sensem. Empleamos diferentes tipos de atributos que abarcan información lingüística diversa y un método de clustering jerárquico aglomerativo para generar varias clasificaciones. Comparamos cada una de estas clasificaciones automáticas con un gold standard creado semi-automáticamente teniendo en cuenta construcciones lingüísticas propuestas desde la lingüística teórica. Esta comparación nos permite saber qué atributos son más adecuados para crear de forma automática una clasificación coherente con la teoría sobre construcciones y cuales son las similitudes y diferencias entre la clasificación verbal automática y la que se basa en la teoría sobre construcciones lingüísticas.

    ----
    In this work we focus on the automatic acquisition of verbal classifications for Spanish. To do so, we perform a series of experiments with 20 verbal senses that belong to the Sensem corpus. We use di↵erent kinds of features that include diverse linguistic information and an agglomerative hierarchical clustering method to generate a number of classifications. We compare each of these automatic classifications with a semi-automatically created gold standard, which is built on the basis of linguistic constructions proposed by theoretical linguistics. This comparison allows us to investigate which features are adequate to build a verb classification coherent with linguistic constructions theory and which are the similarities and di↵erences between an automatic verbal classification and a verb classification based on the theory of linguistic constructions.
    The more extended way of acquiring information for knowledge-based systems is manually, frequently by means of a dialog between the system and the human expert (sometimes with the intervention of a knowledge engineer). However, the high... more
    The more extended way of acquiring information for knowledge-based systems is manually, frequently by means of a dialog between the system and the human expert (sometimes with the intervention of a knowledge engineer). However, the high cost of this approach, together ...
    ... MITRE and SAIC, 1999. [3] Kubala, F., Schwartz, R., Stone, R., and Weischedel, R., “Named Entity Extraction from Speech”, In Proc. of the DARPA Broadcast News Transcription and Understand-ing Workshop, 1998. [4] Kudo, T ...
    The biases of individual algorithms for non-parametric document clus- tering can lead to non-optimal solutions. Ensemble clustering methods may over- come this limitation, but have not been applied to document collections. This paper... more
    The biases of individual algorithms for non-parametric document clus- tering can lead to non-optimal solutions. Ensemble clustering methods may over- come this limitation, but have not been applied to document collections. This paper presents a comparison of strategies for non-parametric document ensemble cluster- ing.
    This paper presents our experiments in question answering for speech corpora. These experiments focus on improving the answer extraction step of the QA process. We present two ap-proaches to answer extraction in question answering for... more
    This paper presents our experiments in question answering for speech corpora. These experiments focus on improving the answer extraction step of the QA process. We present two ap-proaches to answer extraction in question answering for speech corpora that apply machine ...
    In this article, we present a factoid question-answering system, Sibyl, specifically tailored for question answering (QA) on spoken-word documents. This work explores, for the first time, which techniques can be robustly adapted from the... more
    In this article, we present a factoid question-answering system, Sibyl, specifically tailored for question answering (QA) on spoken-word documents. This work explores, for the first time, which techniques can be robustly adapted from the usual QA on written documents to the more difficult spoken document scenario. More specifically, we study new information retrieval (IR) techniques designed or speech, and utilize
    The question answering (QA) task consists of providing short, relevant answers to natural language questions. Most QA research has focused on extracting information from text sources, providing the shortest relevant text in response to a... more
    The question answering (QA) task consists of providing short, relevant answers to natural language questions. Most QA research has focused on extracting information from text sources, providing the shortest relevant text in response to a question. For example, the correct answer to the question, “How many groups participate in the CHIL project?” is “15”, whereas the response to “Who are the partners in CHIL?” is a list of them. This simple example illustrates the two main advantages of QA over current search engines: First, the input is a natural-language question rather a keyword query; and second, the answer provides the desired information content and not simply a potentially large set of documents or URLs that the user must plow through.
    Research Interests:
    ABSTRACT Cluster analysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority... more
    ABSTRACT Cluster analysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority clustering tasks, in which a small fraction of signal data stands against a majority of noise. The approaches proposed so far for minority clustering are supervised: they require the number and distribution of the foreground and background clusters. In supervised learning and all-in clustering, combination methods have been successfully applied to obtain distribution-free learners, even from the output of weak individual algorithms. In this work, we propose a novel ensemble minority clustering algorithm, Ewocs, suitable for weak clustering combination. Its properties have been theoretically proved under a loose set of constraints. We also propose a number of weak clustering algorithms, and an unsupervised procedure to determine the scaling parameters for Gaussian kernels used within the task. We have implemented a number of approaches built from the proposed components, and evaluated them on a collection of datasets. The results show how approaches based on Ewocs are competitive with respect to—and even outperform—other minority clustering approaches in the state of the art.
    Page 1. Automatically extracting Translation Links using a wide coverage semantic taxonomy German Rigau1, Horacio Rodríguez and Jordi Turmo. Departament de Llenguatges i Sistemes Informàtics. Universitat Politècnica de Catalunya. ...
    The growing availability of online text has lead to an increase in the use of automatic knowledge acquisition approaches from textual data, as in Information Extraction (IE). Some IE systems use knowledge learned by single-concept... more
    The growing availability of online text has lead to an increase in the use of automatic knowledge acquisition approaches from textual data, as in Information Extraction (IE). Some IE systems use knowledge learned by single-concept learning systems, as sets of IE rules. Most of such systems need both sets of positive and negative examples. However, the manual selection of positive examples can be a very hard task for experts, while automatic methods for selecting negative examples can generate extremely large example sets, in spite of the fact that only a small subset of them is relevant to learn. This paper briefly describes a more portable multi-concept learning system and presents a methodology to select a relevant set of training examples.
    This paper describes the participation of the Technical University of Catalonia in the CLEF 2009 Question Answering on Speech Transcripts track. We have participated in the English and Spanish scenarios of QAST. For both manual and... more
    This paper describes the participation of the Technical University of Catalonia in the CLEF 2009 Question Answering on Speech Transcripts track. We have participated in the English and Spanish scenarios of QAST. For both manual and automatic transcripts we have used a robust factual Question Answering that uses minimal syntactic information. We have also developed a NERC designed to handle automatic transcripts. We perform a detailed analysis of our results and draw conclusions relating QA performance to word error rate and the difference between written and spoken questions.
    Research Interests:
    This paper describes on-going work on the development of two complemen-tary resources: WordMed® and Scriptum®. The former is a lexico-conceptual knowledge base (KB) comprising information from four medical sub-do-mains (diagnostics,... more
    This paper describes on-going work on the development of two complemen-tary resources: WordMed® and Scriptum®. The former is a lexico-conceptual knowledge base (KB) comprising information from four medical sub-do-mains (diagnostics, procedures, tumors and ...
    This paper presents our experiments in question answering for speech corpora. These experiments focus on improving the answer extraction step of the QA process. We present two ap-proaches to answer extraction in question answering for... more
    This paper presents our experiments in question answering for speech corpora. These experiments focus on improving the answer extraction step of the QA process. We present two ap-proaches to answer extraction in question answering for speech corpora that apply machine ...
    The biases of individual algorithms for non-parametric document clustering can lead to non-optimal solutions. Ensemble clustering methods may overcome this limitation, but have not been applied to document collections. This paper presents... more
    The biases of individual algorithms for non-parametric document clustering can lead to non-optimal solutions. Ensemble clustering methods may overcome this limitation, but have not been applied to document collections. This paper presents a comparison of strategies for non-parametric document ensemble clustering.
    Abstract: The goal of the project is to explore integrated environments allowing the cost-effective deployment of vertical information access portals for specific domains. The project started in January 2010, and will last three years.... more
    Abstract: The goal of the project is to explore integrated environments allowing the cost-effective deployment of vertical information access portals for specific domains. The project started in January 2010, and will last three years. Keywords: Natural Language Processing, Syntactic Analysis, Semantic Interpretation, Knowledge Acquisition, Information Extraction, Information Retrieval