Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content

    Alicia Ageno

    [spa] El objetivo de este trabajo es presentar un análisis cualitativo y cuantitativo de las discrepancias entre anotadores en el etiquetado sintáctico del corpus Cast3LB. Para ello se ha definido un corpus de prueba de mil oraciones que... more
    [spa] El objetivo de este trabajo es presentar un análisis cualitativo y cuantitativo de las discrepancias entre anotadores en el etiquetado sintáctico del corpus Cast3LB. Para ello se ha definido un corpus de prueba de mil oraciones que ha sido etiquetado paralelamente por cinco anotadores. Se han realizado sucesivas evaluaciones de los resultados que han dado lugar a otras tantas mejoras de la guía de anotación hasta su versión definitiva. En una última fase, se analizan cualitativamente y se clasifican las discrepancias entre anotadores. [eng] The main goal of this work is to present a qualitative and quantitative analysis of disagreements among annotators during the syntactic labeling of the Cast3LB corpus. To do so, a one-thousand-sentence corpus has been established and it has been annotated by five annotators. Consecutive evaluations of the results have been done and have led to successive improvements of the guidelines. In the last phase, we present the qualitative analysis and the classification of the differences among annotators
    Background The exponential growth of digital healthcare data is fueling the development of Knowledge Discovery in Databases (KDD). Extracting temporal relationships between medical events is essential to reveal hidden patterns that can... more
    Background The exponential growth of digital healthcare data is fueling the development of Knowledge Discovery in Databases (KDD). Extracting temporal relationships between medical events is essential to reveal hidden patterns that can help physicians find optimal treatments, diagnose illnesses, detect drug adverse reactions, and more. This paper presents an approach for the extraction of patient evolution patterns from electronic health records written in Catalan and/or Spanish. Methods We propose a robust formulation for extracting Temporal Association Rules (TARs) that goes beyond simple rule extraction by considering the sequence of multiple visits. Our highly configurable algorithm leverages this formulation to extract Temporal Association Rules from sequences of medical instances. We can generate rules in the desired format, content, and temporal factors while accounting for different levels of abstraction of medical instances. To demonstrate the effectiveness of our methodolo...
    Research Interests:
    The growing availability of online textual sources and the potential number of applications of knowledge acquisition from textual data has lead to an increase in Information Extraction (IE) research. Some examples of these applications... more
    The growing availability of online textual sources and the potential number of applications of knowledge acquisition from textual data has lead to an increase in Information Extraction (IE) research. Some examples of these applications are the generation of data bases from documents, as well as the acquisition of knowledge useful for emerging technologies like question answering, information integration, and others related to text mining. However, one of the main drawbacks of the application of IE refers to its intrinsic domain dependence. For the sake of reducing the high cost of manually adapting IE applications to new domains, experiments with different Machine Learning (ML) techniques have been carried out by the research community. This survey describes and compares the main approaches to IE and the different ML techniques used to achieve Adaptive IE technology.
    Abstract. This paper describes GeoTALP-IR system, a Geographical Information Retrieval (GIR) system. The system is described and evaluated in the context of our participation in the CLEF 2005 GeoCLEF Monolingual English task. The GIR... more
    Abstract. This paper describes GeoTALP-IR system, a Geographical Information Retrieval (GIR) system. The system is described and evaluated in the context of our participation in the CLEF 2005 GeoCLEF Monolingual English task. The GIR system is based on Lucene and uses a modified version of the Passage Retrieval module of the TALP Question Answering (QA) system presented at CLEF 2004 and TREC 2004 QA evaluation tasks. We designed a Keyword Selection algorithm based on a Linguistic and Geographical Analysis of the topics. A Geographical Thesaurus (GT) has been built using a set of publicly available Geographical Gazetteers and a Geographical Ontology. Our experiments show that the use of a Geographical Thesaurus for Geographical Indexing and Retrieval has improved the performance of our GIR system.
    M. Antonia Marti Universitat de Barcelona Centre de Llenguatge i Computacio (CLiC) Gran Via, 585, 08007-Barcelona, [email protected] Lluis Marquez Universitat Politecnica de Catalunya, Grup de Processament del Llenguatge Natural (GPLN) Jordi... more
    M. Antonia Marti Universitat de Barcelona Centre de Llenguatge i Computacio (CLiC) Gran Via, 585, 08007-Barcelona, [email protected] Lluis Marquez Universitat Politecnica de Catalunya, Grup de Processament del Llenguatge Natural (GPLN) Jordi Girona 1-3, 08034 [email protected] Alicia Ageno Universitat Politecnica de Catalunya, Grup de Processament del Llenguatge Natural (GPLN) Jordi Girona 1-3, 08034 [email protected]
    In this paper, we present the results of the 3LB project, which consist on the development of three corpora (one for Catalan, one for Spanish and one for Basque) with syntactic and semantic annotation. We show the criteria followed for... more
    In this paper, we present the results of the 3LB project, which consist on the development of three corpora (one for Catalan, one for Spanish and one for Basque) with syntactic and semantic annotation. We show the criteria followed for each annotation, the different tools developed for each tagging and the results of annotation evaluation.
    espanolEste articulo describe la metodologia utilizada por el equipo TALP- UPC en la tarea propuesta en SEPLN 2013 para la normalizacion de tweets (Tweet- Norm). El sistema usa una bateria de modulos para generar diferentes propuestas de... more
    espanolEste articulo describe la metodologia utilizada por el equipo TALP- UPC en la tarea propuesta en SEPLN 2013 para la normalizacion de tweets (Tweet- Norm). El sistema usa una bateria de modulos para generar diferentes propuestas de correccion para cada palabra desconocida. La correccion definitiva se elige por votacion ponderada segun la precision de cada modulo. EnglishThis paper describes the methodology used by the TALP-UPC team for the SEPLN 2013 shared task of tweet normalization (Tweet-Norm). The system uses a set of modules that propose different corrections for each out-of-vocabulary word. The final correction is chosen by weighted voting according to each module accuracy.
    The goal of this project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find "the... more
    The goal of this project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find "the need behind the query". These technologies will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate searchs on document collections (websites), multimedia (images, audio, video), semi-structured texts and restricted domains.
    The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find "the need... more
    The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find "the need behind the query". This new generation will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate textual search (websites) and multimedia search (images, audio, video), it will be able to find and organize information, rather than generating ranked lists of websites.
    This document describes the work performed by the Universitat Politde Catalunya (UPC) in its first participation at TAC-KBP 2012 in both the Entity Linking and the Slot Filling tasks.
    M. Civit , A. Ageno , B. Navarro , N. Bufí , M.A. Martí CLiC Centre de Llenguatge i Computació Adolf Florensa s/n (Torre Florensa) 08028 Barcelona {civit, nuria}@clic.fil.ub.es; [email protected] TALP Research Centre (UPC) Jordi Girona no... more
    M. Civit , A. Ageno , B. Navarro , N. Bufí , M.A. Martí CLiC Centre de Llenguatge i Computació Adolf Florensa s/n (Torre Florensa) 08028 Barcelona {civit, nuria}@clic.fil.ub.es; [email protected] TALP Research Centre (UPC) Jordi Girona no 3 08034 Barcelona [email protected] Departamento de Lenguajes y Sistemas Informáticos Universidad de Alicante Campus de San Vicente del Raspeig Apartado 99. 03080 Alicante [email protected]
    In this paper we present a hybrid approach for the acquisition of syntacticosemantic patterns from raw text. Our approach co-trains a decision list learner whose feature space covers the set of all syntactico-semantic patterns with an... more
    In this paper we present a hybrid approach for the acquisition of syntacticosemantic patterns from raw text. Our approach co-trains a decision list learner whose feature space covers the set of all syntactico-semantic patterns with an Expectation Maximization clustering algorithm that uses the text words as attributes. We show that the combination of the two methods always outperforms the decision list learner alone. Furthermore, using a modular architecture we investigate several algorithms for pattern ranking, the most important component of the decision list learner.
    Two methods for stochastically modelling bidirectionality in chart parsing are presented. A probabilistic islanddriven parser which uses such models (either isolated or in combination) has been built and tested on wide-coverage corpora.... more
    Two methods for stochastically modelling bidirectionality in chart parsing are presented. A probabilistic islanddriven parser which uses such models (either isolated or in combination) has been built and tested on wide-coverage corpora. The best results are accomplished by the hybrid approaches that combine both methods.
    CESS-consulta es una interfaz desarrollada en el marco del proyecto CESSECE1, que permite hacer consultas a la información contenida en los corpus etiquetados morfosintáctica y semánticamente de dicho proyecto. Los corpus contienen... more
    CESS-consulta es una interfaz desarrollada en el marco del proyecto CESSECE1, que permite hacer consultas a la información contenida en los corpus etiquetados morfosintáctica y semánticamente de dicho proyecto. Los corpus contienen información sobre la estructura de constituyentes, las funciones sintácticas de los mismos y los papeles temáticos asociados. Actualmente, los corpus disponibles contienen 100.000 palabras, CESS-CAT para el catalán, CESS-ESP para el castellano y CESS-EUS para el euskera. ...
    Knowledge Acquisition constitutes a main problem as regards the development of real Knowledge-based systems. This problem has been dealt with in a variety of ways. One of the most promising paradigms is based on the use of already... more
    Knowledge Acquisition constitutes a main problem as regards the development of real Knowledge-based systems. This problem has been dealt with in a variety of ways. One of the most promising paradigms is based on the use of already existing sources in order to extract knowledge from them semiautomatically which will then be used in Knowledge-based applications. The Acquilex Project, within which we are working, follows this paradigm. The basic aim of Acquilex is the development of techniques and methods in order to use Machine ...
    Recerca. Publicacions. S. Acebo and Alicia Ageno and Salvador Climent and Javier Farreres and Lluís Padró and Francesc Ribas and Horacio Rodríguez and O. Soler. MACO: Morphological Analyzer Corpus-Oriented Dept. LSI - Universitat... more
    Recerca. Publicacions. S. Acebo and Alicia Ageno and Salvador Climent and Javier Farreres and Lluís Padró and Francesc Ribas and Horacio Rodríguez and O. Soler. MACO: Morphological Analyzer Corpus-Oriented Dept. LSI - Universitat PolitÚcnica de Catalunya. 1994. ...

    And 15 more