Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content
The WNSImRep v1 dataset is provided as supplementary material of the paper by Lastra-Díaz, J. J., & García-Serrano, A. (2016). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible... more
The WNSImRep v1 dataset is provided as supplementary material of the paper by Lastra-Díaz, J. J., & García-Serrano, A. (2016). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems. In the aforementioned work, we introduce a scalable Java software library of ontology-based semantic similarity measures and IC models, called HESML, and a set of reproducible experiments on word similarity. The WNSimRep v1 dataset is detailed in the enclosed file called "appendixB_WNSimRep_dataset_LastraGarcia_v1.pdf". This work introduces a framework whose aim is to allow the exact replication of most intrinsic Information Content (IC) models and ontology-based similarity measures reported in the literature by using the publicly available accompanying dataset, called the WNSimRep v1 dataset. This work has been carried-out in the context of a large evaluation campaign of ontology-based semantic similarity measures and IC models on WordNet based on HESML. Our work is encouraged by the identification of several reproducibility problems in a series of recent experimental surveys carried-out by the authors, together with the lack of a framework and gold standard to assist in the replication of ontology-based similarity measures and IC models. To bridge this gap, we introduce herein a replication framework defined by three different types of data file: (a) node-based data files which contain an explicit representation of the WordNet taxonomy together with a specific IC model and a collection of node-based taxonomical features, (b) edge-based data files which contain a family of edge-valued IC models based on the conditional probability between child and parent concepts, and (c) synset-pair-based data files which contain the synset pairs of the Rubenstein-Goodenough word similarity benchmark, together with a collection of taxonomical features based on synset pairs and all the ontology-based similarity measures evaluated on them. The fr [...]
This dataset introduces a companion reproducibility Java console program, called HESML_vs_SML_test.jar, of the work introduced by Lastra-Díaz and García-Serrano [1]. This latter work introduces the Half-Edge Semantic Measures Library... more
This dataset introduces a companion reproducibility Java console program, called HESML_vs_SML_test.jar, of the work introduced by Lastra-Díaz and García-Serrano [1]. This latter work introduces the Half-Edge Semantic Measures Library (HESML), and carries-out an experimental survey between HESML V1R2, the Semantic Measures Library (SML) 0.9 [2] and the WNetSS [4] semantic measures libraries. The HESML_vs_SML_test.jar program runs the set of performance and scalability benchmarks detailed in [1] and generates the figures and tables of results reported in the aforementioned work, which are also enclosed as complementary files of this dataset (see files below). Licensing note: The 'HESML_vs_SML_test.jar' program is based on the HESML V1R2 [3], SML 0.9 [2] and WNetSS [4] semantic measures libraries, and it includes these libraries in its distribution, as well as WordNet 3.0 [6] and the SimLex665 [5] dataset. Thus, if you use this dataset, you should also cite the works related to these resources. References: [1] Lastra-Díaz, J. J., and García-Serrano, A. (2016). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. To appear in Information Systems Journal. [2] Harispe, S., Ranwez, S., Janaqi, S., and Montmain, J. (2014). The Semantic Measures Library: Assessing Semantic Similarity from Knowledge Representation Analysis. In E. Métais, M. Roche, & M. Teisseire (Eds.), Proc. of the 19th International Conference on Applications of Natural Language to Information Systems (NLDB 2014) (Vol. 8455, pp. 254–257). Montpelier, France: Springer. http://dx.doi.org/10.1007/978-3-319-07983-7_37 [3] Lastra-Díaz, J. J., & García-Serrano, A. (2016). HESML V1R2 Java software library of ontology-based semantic similarity measures and information content models. Mendeley Data, v2. https://doi.org/10.17632/t87s78dg78.2 [4] Ben Aouicha, M., Taieb, M. A. H., and Ben Hamadou, A. (2016). SISR: System for integrating semantic relatedness and similarity meas [...]
HESML V1R2 is the second release of the Half-Edge Semantic Measures Library (HESML) [1], which is a new, scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models based... more
HESML V1R2 is the second release of the Half-Edge Semantic Measures Library (HESML) [1], which is a new, scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models based on WordNet. HESML V1R2 implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature. In addition, it provides a XML-based input file format in order to specify the execution of reproducible experiments on WordNet-based similarity, even with no software coding. The V1R2 release significantly improves the performance of HESML V1R1. HESML is introduced and detailed in a companion reproducibility paper [1] of the methods and experiments introduced in [2,3,4]. The main features of HEMSL are as follows: (1) it is based on an efficient and linearly scalable representation for taxonomies called PosetHERep introduced in [1], (2) its performance exhibits a linear scalability as regards the size of the taxonomy, and (3) it does not use any caching strategy of vertex sets. HESML V1R2 is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the main HESML paper [1] as attribution requirement. On other hand, the commercial use of the similarity measures introduced in [2], as well as part of the intrinsic IC models introduced in [3] and [4], is protected by a patent application [5]. In addition, any user of HESML must fulfill other licensing terms described in [1] related to other resources distributed with the library, such as WordNet and a dataset of corpus-based IC models, among others. References: [1] Lastra-Díaz, J. J., & García-Serrano, A. (2016). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. To appear in Information Systems Journal. [2] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A novel family of IC-based similarity measures with a detailed experimental [...]
HESML V1R1 is a new Java software library called Half-Edge Semantic Measures Library (HESML), which implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the... more
HESML V1R1 is a new Java software library called Half-Edge Semantic Measures Library (HESML), which implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature. HESML is introduced and detailed in the paper by Lastra-Díaz, J. J., & García-Serrano, A. (2016). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems. HESML is motivated by several drawbacks in the current state-of-the-art software libraries, as well as the evaluation of the new methods introduced by the authors, together with the replication and evaluation of most previously reported methods. HESML is based on a new and efficient poset representation, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs in computational geometry. HESML proposes a memory-efficient representation for taxonomies which linearly scales with the taxonomy size and provides an efficient implementation of a large set of topological queries and graph-based algorithms. Likewise, HESML provides an open framework to aid research into the area by providing a simpler and more efficient software architecture than the current software libraries.
HESML V1R4 is the fourth release of the Half-Edge Semantic Measures Library (HESML) detailed in [1], which is a new, linerarly scalable and efficient Java software library of ontology-based semantic similarity measures and Information... more
HESML V1R4 is the fourth release of the Half-Edge Semantic Measures Library (HESML) detailed in [1], which is a new, linerarly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models based on WordNet. HESML V1R4 implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature, as well as the evaluation of three pre-trained word embedding models. It also provides a XML-based input file format in order to specify the execution of reproducible experiments on WordNet-based similarity, even with no software coding. HESML V1R4 introduces the following novelties: (1) a software implementation for the evaluation of three pre-trained word embedding file formats which support most of state-of--the-art models reported in the literature; (2) a software implementation of an intrinsic IC model and two new IC-based semantic similarity measures introduced by Cai et al. (2017); (3) a software implementation of a fast approximation of the Wu&Palmer (1994) measure commonly used in the literature; (4) the integration of a very large set of word similarity benchmarks; and finally (5), the correction of an error in our software implementation of the Leacock&Chodorow (1998) measure in previous HESML versions. HESML library is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the main HESML paper [1] as attribution requirement. On other hand, the commercial use of the similarity measures introduced in [2], as well as part of the intrinsic IC models introduced in [3] and [4], is protected by a patent application [5]. In addition, any user of HESML must fulfill other licensing terms described in [1] related to other resources distributed with the library. References: [1] Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML: a scalable ontology-based semantic similarity measures libra [...]
HESML V1R3 is the third release of the Half-Edge Semantic Measures Library (HESML) detailed in [1], which is a new, scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC)... more
HESML V1R3 is the third release of the Half-Edge Semantic Measures Library (HESML) detailed in [1], which is a new, scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models based on WordNet. HESML V1R3 implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature. It also provides a XML-based input file format in order to specify the execution of reproducible experiments on WordNet-based similarity, even with no software coding. The main features of HESML are as follows: (1) it is based on an efficient and linearly scalable representation for taxonomies called PosetHERep introduced in [1], (2) its performance exhibits a linear scalability as regards the size of the taxonomy, and (3) it does not use any caching strategy of vertex sets. HESML V1R3 introduces two minor novelties as follows: the vertex ID has been updated from Integer to Long type in order to support a larger number of vertexes, and it includes five new similarity measures introduced by Hao et al (2011), Liu et al (2007), Pekar&Staab (2002) and Stojanovic et al (2001). HESML library is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the main HESML paper [1] as attribution requirement. On other hand, the commercial use of the similarity measures introduced in [2], as well as part of the intrinsic IC models introduced in [3] and [4], is protected by a patent application [5]. In addition, any user of HESML must fulfill other licensing terms described in [1] related to other resources distributed with the library, such as WordNet and a dataset of corpus-based IC models, among others. References: [1] Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, [...]
The design of a computer support environment for cooperation must be based on the set of agreed organization procedures defined in a previous conceptual modelling phase (chapter 4).
There are some similarities in developing distance education online courses and Massive Open Online Courses (MOOCs) using the basis of eLearning instructional design. However, the task of converting an online course into a MOOC is not as... more
There are some similarities in developing distance education online courses and Massive Open Online Courses (MOOCs) using the basis of eLearning instructional design. However, the task of converting an online course into a MOOC is not as simple as direct migration of eLearning materials and assessment resources into a MOOC platform. In online learning, learners should be continually influenced by information, social interaction, and learning experiences, providing them with the knowledge to come up with new ideas to develop within an engaging course. In this chapter, the process of MOOCification a distance education online course on “Design for All for an Inclusive and Accessible Society” is explained and contextualized. The re-factorization process has been based upon the quality model used for MOOCs at UNED Abierta and the instructional design based on Gagné's events of instruction. The eLearning activities were completely refactored, along with the content itself, the interac...
This dataset is provided as supplementary material of the paper by Lastra-Díaz, J. J., & García-Serrano, A. (2016). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a... more
This dataset is provided as supplementary material of the paper by Lastra-Díaz, J. J., & García-Serrano, A. (2016). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems. This dataset contains a ReproZip reproducible experiment file, called "HESMLv1r1_reproducible_exps.rpz", which allows the experimental surveys on word similarity on WordNet introduced in the three papers below to be reproduced exactly. [1] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence Journal, 46, 140–153. http://dx.doi.org/10.1016/j.engappai.2015.09.006 [2] Lastra-Díaz, J. J., & García-Serrano, A. (2015). A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems, 89, 509–526. http://dx.doi.org/10.1016/j.knosys.2015.08.019 [3] Lastra-Díaz, J. J., & García-Serrano, A. (2016). A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet (No. TR-2016-01). NLP and IR Research Group. ETSI Informática. Universidad Nacional de Educación a Distancia (UNED). http://e-spacio.uned.es/fez/view/bibliuned:DptoLSI-ETSI-Informes-Jlastra-refinement
This paper introduces a novel family of ontology-based similarity measures based on the Information Content (IC) theory, a detailed state of the art, a large experimental survey into ontology-based...
Measuring semantic similarity between sentences is an important task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining, among others. HESML is a self-contained experimentation... more
Measuring semantic similarity between sentences is an important task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining, among others. HESML is a self-contained experimentation platform on word and sentence similarity and relatedness which is especially well suited to run large experimental surveys by supporting the execution of automatic reproducible experiment files based on a XML-based file format. HESML library has been developed in Java 8 and Netbeans 8. This dataset introduces HESML V2R1, implementing the protocol in [1], which is the sixth release of the Half-Edge Semantic Measures Library (HESML), and is based on HESML V1R5 [2]. HESML V2R1 is a linearly scalable and efficient Java software library of word and sentence semantic similarity measures. This last release of HESML allows the evaluation and comparison of most of the sentence similarity methods for the biomedical domain as well as the study on the impact of different pre-processing configurations on the performance of the sentence similarity methods. [1] Lara-Clares A, Lastra-Díaz JJ, Garcia-Serrano A. Protocol for a reproducible experimental survey on biomedical sentence similarity. PLoS One. 2021;16: e0248663. doi:10.1371/journal.pone.0248663 [2] Juan J. Lastra-Díaz; Alicia Lara-Clares; Ana Garcia-Serrano, 2021, "HESML V1R5 Java software library of ontology-based semantic similarity measures and information content models", https://doi.org/10.21950/1RRAWJ, e-cienciaDatos, V2
This dataset introduces HESML V1R5 which is the fifth release of the Half-Edge Semantic Measures Library (HESML) detailed in [13]. HESML V1R5 is a linearly scalable and efficient Java software library of ontology-based semantic similarity... more
This dataset introduces HESML V1R5 which is the fifth release of the Half-Edge Semantic Measures Library (HESML) detailed in [13]. HESML V1R5 is a linearly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models for ontolgies like WordNet, SNOMED-CT, MeSH, GO and any other ontologies based on the OBO file format. HESML V1R5 implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature, as well as the evaluation of three pre-trained word embedding models. It also provides a XML-based input file format in order to specify the execution of reproducible word/concept similarity experiments based on WordNet, SNOMED-CT, MeSH, or GO without software coding. HESML V1R5 introduces the following novelties: (1) the parsing and in-memory representation of the SNOMED-CT, MeSH and any other ontologies based on the OBO file format such as the Gene Ontology (GO); (2) a new collection of efficient path-based similarity measures based on the reformulation of previous path-based measures which are based on the new Ancestors-based Shortest-Path Length (AncSPL) algorithm; and (3) a collection of groupwise similarity measures. HESML library is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the two mains HESML papers as attribution requirement. However, HESML distribution also includes other datasets, databases or data files whose use require the attribution acknowledgement by any user of HEMSL. Thus, we urge to the HESML users to fulfill with licensing terms related to other resources distributed with the library as detailed in its companion release notes.
This protocol introduces a set of reproducibility resources with the aim of allowing the exact replication of the experiments introduced by our main paper [1], which introduces the largest and for the first time reproducible experimental... more
This protocol introduces a set of reproducibility resources with the aim of allowing the exact replication of the experiments introduced by our main paper [1], which introduces the largest and for the first time reproducible experimental survey on biomedical sentence similarity. HESML V2R1 [2] is the sixth release of our Half-Edge Semantic Measures Library (HESML), which is a linearly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models for ontologies like WordNet, SNOMED-CT, MeSH and GO. This protocol sets a self-contained reproducibility platform which contains the Java source code and binaries of our main benchmark program, as well as a Docker image which allows the exact replication of our experiments in any software platform supported by Docker, such as all Linux-based operating systems, Windows or MacOS. All the necessary resources for executing the experiments are published in the permanent repository ...
This dataset introduces a set of reproducibility resources with the aim of allowing the exact replication of the experiments introduced by our companion paper, which compare the performance of the three UMLS-based semantic similarity... more
This dataset introduces a set of reproducibility resources with the aim of allowing the exact replication of the experiments introduced by our companion paper, which compare the performance of the three UMLS-based semantic similarity libraries reported in the literature as follows: (1) UMLS::Similarity [20], (2) Semantic Measures Library (SML) [3], and the latest version of our Half-Edge Semantic Measures Library (HESML) introduced in our aforementioned companion paper. HESML V1R5 is the fifth release of our Half-Edge Semantic Measures Library (HESML) detailed in [15] which is a linearly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models for ontologies like WordNet, SNOMED-CT, MeSH and GO. This dataset sets a self-contained reproducibility platform which contains the Java source code and binaries of our main benchmark program, as well as a Docker image which allows the exact replication of our experiments i...
Semantic Textual Similarity (also known as Semantic Short-text Similarity) is a research problem that aims to calculate the similarity among text units (phrases, sentences, paragraphs or texts) focusing on the semantic content. The... more
Semantic Textual Similarity (also known as Semantic Short-text Similarity) is a research problem that aims to calculate the similarity among text units (phrases, sentences, paragraphs or texts) focusing on the semantic content. The importance of Semantic Similarity in Natural Language Processing has increased in the last years due to its relevance in many tasks and applications, such as Automatic Summarization, Machine Translation, Question Answering or Semantic Indexing. UB-NER is a self-contained Java software library for benchmarking state-of-the-art STS measures in the biomedical domain. It allows to define and execute a set of experiments combining different measures and preprocessing methods. This dataset contains the reproducibility framework and dependencies, whose aim is to allow the exact replication of unsupervised named entity recognition experiment in the biomedical domain as detailed in "ReproductionProtocol.pdf" file.
Background Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development... more
Background Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. Results To bridge the two aforement...
Measuring semantic similarity between sentences is a significant task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining. For this reason, the proposal of sentence similarity methods... more
Measuring semantic similarity between sentences is a significant task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining. For this reason, the proposal of sentence similarity methods for the biomedical domain has attracted a lot of attention in recent years. However, most sentence similarity methods and experimental results reported in the biomedical domain cannot be reproduced for multiple reasons as follows: the copying of previous results without confirmation, the lack of source code and data to replicate both methods and experiments, and the lack of a detailed definition of the experimental setup, among others. As a consequence of this reproducibility gap, the state of the problem can be neither elucidated nor new lines of research be soundly set. On the other hand, there are other significant gaps in the literature on biomedical sentence similarity as follows: (1) the evaluation of several unexplored sentence similarity me...
The Web of Data (WOD) contains a large amount of formalized and interconnected data, offering a valuable help for experimental tasks requiring an accurate data representation. However, the practical application of such data is often... more
The Web of Data (WOD) contains a large amount of formalized and interconnected data, offering a valuable help for experimental tasks requiring an accurate data representation. However, the practical application of such data is often limited by the complexity when it comes to extracting the necessary information, mainly because of the lack of a proper structure and organization in the WOD-resources. The (re)organization of the knowledge contained in these resources might facilitate the identification of the necessary information and, consequently, limit the problems arising in their practical application. In this context, this paper proposes the application of Formal Concept Analysis (FCA) to create a concept-based abstraction that better organizes the knowledge contained in the WOD-resources. In order to test, to what extent this enhanced organization is able to improve the data representation process, the obtained FCA models will be tested in a practical application to represent a set of Twitter contents in a specific task: the Topic Detection task at Replab 2013. The results demonstrate that the better data representation obtained through FCA improves the operation of the topic detection process, outperforming state-of-the-art results.
Este resumen que intenta ser conciso y sustancial, es consecuencia de las aportaciones realizadas en una actividad de aprendizaje, en el marco de la asignatura “Semantica y pragmatica en la web” del master de Tecnologias de la Lengua de... more
Este resumen que intenta ser conciso y sustancial, es consecuencia de las aportaciones realizadas en una actividad de aprendizaje, en el marco de la asignatura “Semantica y pragmatica en la web” del master de Tecnologias de la Lengua de la UNED2, en el curso 2019-20, a partir del contenido disponible on-line3 del foro Iberian Languages Evaluation Forum IberLEF2019 (organizado en el marco de la SEPLN2019). IberLEF2019 es un foro de evaluacion en el que se plantean retos o tareas competitivas de procesamiento de textos para las lenguas de la peninsula iberica (espanol, portugues, catalan, vasco y gallego). Este foro de evaluacion esta organizado a modo de competicion entre los sistemas participantes que asumen un mismo reto, esto es la realizacion de una tarea o resolucion de un problema con los mismos datos y en el mismo escenario. Los organizadores del reto deben aportar un dataset o corpus, definir el reto o tarea a resolver, indicar las medidas de evaluacion de los resultados para...
Unsupervised Named Entity Recognition (NER) approaches do not depend on labelled data to function properly but rather on a source of knowledge, in which promising candidates can be looked up to find the corresponding concept. In the... more
Unsupervised Named Entity Recognition (NER) approaches do not depend on labelled data to function properly but rather on a source of knowledge, in which promising candidates can be looked up to find the corresponding concept. In the biomedical domain knowledge source like this already exists; namely the Unified Medical Language System (UMLS). In this paper, three different unsupervised NER models using UMLS, namely MetaMap, cTakes and MetaMapLite are evaluated and compared from the results published by Demner-Fushman, Rogers and Aronson (2017) and Reategui and Ratte (2018). The Unsupervised Biomedical Named Entity Recognition framework (UB-NER) is developed, with which the results of the experiments of the three models, five datasets and two NER tasks are presented.
This dataset is a companion reproducibility package of the related paper submitted for publication, whose aim is to allow the exact replication of a very large experimental survey on word similarity between the families of ontology-based... more
This dataset is a companion reproducibility package of the related paper submitted for publication, whose aim is to allow the exact replication of a very large experimental survey on word similarity between the families of ontology-based semantic similarity measures and word embedding models as detailed in ‘appendix-reproducible-experiments.pdf’ file. Our experiments are based on the evaluation of all methods with the HESML V1R4 semantic measures library and the recording of these experiments with Reprozip. HESML is a self-contained Java software library of semantic measures based on WordNet whose latest version, called HESML V1R4, also supports the evaluation of pre-trained word embedding files. HESML is a self-contained experimentation platform on word similarity which is especially well suited to run large experimental surveys by supporting the execution of automatic reproducible experiment files on word similarity based on a XML-based file format called (*.exp). On the other han...
Research Interests:
This book constitutes the refereed post-proceedings of the 9th International Conference on Adaptive Multimedia Retrieval, AMR 2011, held in Barcelona, Spain, in July 2011. The 9 revised full papers and the invited contribution presented... more
This book constitutes the refereed post-proceedings of the 9th International Conference on Adaptive Multimedia Retrieval, AMR 2011, held in Barcelona, Spain, in July 2011. The 9 revised full papers and the invited contribution presented were carefully reviewed and selected from numerous submissions. The papers cover topics ranging from theoretical work to practical implementations and its evaluation, most of them dealing with audio or music media. They are organized in topical sections on evaluation and user studies, audio and music, image retrieval, and similarity and music.
El tiempo es un elemento de importancia capital en todo espacio de información y Twitter no es una excepción. La explotación de la información temporal en tareas de recuperación y organización de información, tiene una larga tradición.... more
El tiempo es un elemento de importancia capital en todo espacio de información y Twitter no es una excepción. La explotación de la información temporal en tareas de recuperación y organización de información, tiene una larga tradición. Sin embargo, esta clase de enfoques, basados en contenido, no han sido muy explorados para el dominio de Twitter, y en consecuencia escasean los Corpus de tweets anotados con información temporal. En este artículo, se propone un modelo de anotación de la información temporal en el dominio de Twitter, basado en el Análisis de Conceptos Formales, en el que los atributos del contexto serán las expresiones temporales, eventos y tipos de eventos presentes en los tweets. Se define un Calendario especialmente adecuado a los fenómenos de conmemoración de aniversarios y fechas señaladas en Twitter, el Calendario Imaginario-Colectivo. El Corpus de estudio ha sido extraído de la colección de RepLab2013. Se incluye un completo análisis del mismo desde una perspec...
Este artículo se centra en el análisis de dos investigaciones de diverso signo guiadas por la inteligencia artificial dentro del campo de las HD. El primero es una investigación muy conocida y exitosa de dos lingüistas que resuelven un... more
Este artículo se centra en el análisis de dos investigaciones de diverso signo guiadas por la inteligencia artificial dentro del campo de las HD. El primero es una investigación muy conocida y exitosa de dos lingüistas que resuelven un caso de atribución de autoría a través de la construcción de un corpus digital de 150 obras de 40 novelistas italianos. El segundo es la investigación llevada a cabo en el corpus digital DIMH (El Dibujante Ingeniero al servicio de la Monarquía Hispánica. Siglos XVI-XVIII), una evolución de la Colección de mapas, planos y dibujos del Archivo General de Simancas (siglos XVI-XVIII), cuyo objetivo fue desarrollar herramientas de soporte a tareas de anotación semántica, búsqueda de información, extracción de relaciones ocultas en los textos y visualización de los resultados para facilitar la investigación de los historiadores. A través de estos dos ejemplos, este artículo busca mostrar los métodos, procesos y posibilidades de éxito en problemas complejos d...
Abstract. En este trabajo se presenta una evaluación de diferentes modelos de indexación multilingüe y ordenación de resultados utilizando los modelos BM25/BM25F para un problema de Recuperación de Imágenes basada en texto. La evaluación... more
Abstract. En este trabajo se presenta una evaluación de diferentes modelos de indexación multilingüe y ordenación de resultados utilizando los modelos BM25/BM25F para un problema de Recuperación de Imágenes basada en texto. La evaluación se realiza sobre la colección disponible bajo licencia para los participantes de la competición internacional de ImageClef2010. Los resultados obtenidos muestran que, a pesar de su elevado coste computacional, el mejor enfoque es traducir los documentos a todas las lenguas ...
Abstract. Este artículo presenta el modelado de usuarios sobre la base de los contenidos consultados con anterioridad por éstos. Para generar este modelo se plantea una aproximación basada en divergencias entre términos, en lugar de... more
Abstract. Este artículo presenta el modelado de usuarios sobre la base de los contenidos consultados con anterioridad por éstos. Para generar este modelo se plantea una aproximación basada en divergencias entre términos, en lugar de similitudes. El objetivo es tener un modelado que capture la actividad específica de los usuarios, y no solo aquella más genérica. Este modelado servirá como base para, por ejemplo, el desarrollo de un sistema de recomendación, evitando el problema de sobre-especialización, gracias a la ...
El desarrollo de sistemas genéricos para tratamiento automáticodel lenguaje está limitado por la imposibilidad de tener disponibletodo el conocimiento requerido para cualquier dominio de aplicación. Por ello, la solución propuesta en este... more
El desarrollo de sistemas genéricos para tratamiento automáticodel lenguaje está limitado por la imposibilidad de tener disponibletodo el conocimiento requerido para cualquier dominio de aplicación. Por ello, la solución propuesta en este trabajo se basa en el desarrollode un sistema modular y multiforme que permita la incorporaciónde los distintos tipos de conocimiento lingüístico y extralingüístico. Con este fin, se ha elaborado una propuesta general de estructuracióndel conocimiento para la interpretación de textos, ...
Abstract:-Nowadays multi-agent systems (MAS) provide a promising approach to the development of systems in complex domains with high distribution and high scalability, but there is still no consensus on the concept involved in this field.... more
Abstract:-Nowadays multi-agent systems (MAS) provide a promising approach to the development of systems in complex domains with high distribution and high scalability, but there is still no consensus on the concept involved in this field. Moreover, the new challenges in developing MAS make necessary a new approach in the engineering of the systems. Kleklra is a methodology for developing multi-agent systems based on our experience in implementing МАЯ. Elektra is supported by a clear and detailed MAS meta- ...
Unsupervised Named Entity Recognition (NER) approaches do not depend on labelled data to function properly but rather on a source of knowledge, in which promising candidates can be looked up to find the corresponding concept. In the... more
Unsupervised Named Entity Recognition (NER) approaches do not depend on labelled data to function properly but rather on a source of knowledge, in which promising candidates can be looked up to find the corresponding concept. In the biomedical domain knowledge source like this already exists; namely the Unified Medical Language System (UMLS). In this paper, three different unsupervised NER models using UMLS, namely MetaMap, cTakes and MetaMapLite are evaluated and compared from the results published by Demner-Fushman, Rogers and Aronson (2017) and Reategui and Ratte (2018). The Unsupervised Biomedical Named Entity Recognition framework (UB-NER) is developed, with which the results of the experiments of the three models, five datasets and two NER tasks are presented.Los enfoques para reconocimiento de entidades nombradas no supervisados (NER, por sus siglas en inglés) no dependen de corpus con datos etiquetados, sino de una fuente de conocimiento donde buscar candidatos prometedores ...
Resumen Se presenta una panor amica de la Inteligencia Arti cial Distribuida y de los Sistemas Multiagente, describiendo los conceptos relacionados y los objetivos de las diferentes l neas de actuaci on: la Resoluci on Distribuida de... more
Resumen Se presenta una panor amica de la Inteligencia Arti cial Distribuida y de los Sistemas Multiagente, describiendo los conceptos relacionados y los objetivos de las diferentes l neas de actuaci on: la Resoluci on Distribuida de Problemas, los Sistemas Multiagente y los Agentes Aut onomos.

And 160 more

Este libro contiene los resúmenes de las propuestas presentadas al II Congreso Internacional de Humanidades Digitales Hispánicas, los días 5-7 de octubre de 2015. www.hdh2015.linhd.es
Research Interests:
ABSTRACT In this paper we present the second participation of the NLP&IR group at UNED in the MediaEval Genre Tagging Task. This categorization task was carried out applying an Information Retrieval (IR) approach considering the video... more
ABSTRACT In this paper we present the second participation of the NLP&IR group at UNED in the MediaEval Genre Tagging Task. This categorization task was carried out applying an Information Retrieval (IR) approach considering the video collection's textual data and query expansion techniques. The results show that the combination of social tags and language models is useful to perform query expansion.
La AEPIA promueve la celebración bienal de estas conferencias, para fomentar la difusión de la Inteligencia Artificial en general. Estas conferencias son una buena referencia de los nuevos paradigmas y las nuevas tecnologías de la... more
La AEPIA promueve la celebración bienal de estas conferencias, para fomentar la difusión de la Inteligencia Artificial en general. Estas conferencias son una buena referencia de los nuevos paradigmas y las nuevas tecnologías de la Inteligencia Artificial así como de las técnicas que tienen un grado de madurez suficiente para intervenir, tanto en el desarrollo de aplicaciones de Ingeniería del Conocimiento como de Ingeniería del Software en general.
Resumen Se presenta una panor amica de la Inteligencia Arti cial Distribuida y de los Sistemas Multiagente, describiendo los conceptos relacionados y los objetivos de las diferentes l neas de actuaci on: la Resoluci on Distribuida de... more
Resumen Se presenta una panor amica de la Inteligencia Arti cial Distribuida y de los Sistemas Multiagente, describiendo los conceptos relacionados y los objetivos de las diferentes l neas de actuaci on: la Resoluci on Distribuida de Problemas, los Sistemas Multiagente y los Agentes Aut onomos.
1 Advanced Databases Group, Computer Science Department, Universidad Carlos III de Madrid, Avda. Universidad 30, 28911 Leganés, Madrid, Spain {pmf,jlmferna}@inf.uc3m.es, [email protected] 2 Department of Telematic Engineering,... more
1 Advanced Databases Group, Computer Science Department, Universidad Carlos III de Madrid, Avda. Universidad 30, 28911 Leganés, Madrid, Spain {pmf,jlmferna}@inf.uc3m.es, [email protected] 2 Department of Telematic Engineering, Universidad Carlos III de Madrid, Avda. Universidad 30, 28911 Leganés, Madrid, Spain [email protected] 3 DAEDALUS – Data, Decisiond and Language, SA Centro de Empresas “La Arboleda”, Ctra. N-III km.