Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content
This paper reports an investigation of the impact of temperature-dependent electrical and thermal conductivities (as governed by the classical Weidemann?Franz?Lorenz law) on the thermal response of nickel?chromium (nichrome) alloy... more
This paper reports an investigation of the impact of temperature-dependent electrical and thermal conductivities (as governed by the classical Weidemann?Franz?Lorenz law) on the thermal response of nickel?chromium (nichrome) alloy cylinders. The relationship between the conductivities and temperature for nichrome is seen to be quite well characterized by a quadratic dependence. This relationship is incorporated into the heat conduction equation and the thermal response of an electrically heated, cylindrical specimen cooled by free convection is studied. It is seen that the temperature dependent response of the cylinder consistently reaches an average steady state temperature and a time to steady state that are lower than the temperature-independent case, e.g. by about 30% and 25%, respectively, for a thin disc. It is also demonstrated that if the temperature dependence of conductivities were ignored when determining these properties from test data, the assumption translates into an under-prediction of temperature by about 25??C at certain locations in the cylinder; the implications of these errors are discussed.
Research on developing techniques to access user generated content, and specifically user reviews on different products, came in the focus of the information research community in recent past. In particular, this paper addresses the... more
Research on developing techniques to access user generated content, and specifically user reviews on different products, came in the focus of the information research community in recent past. In particular, this paper addresses the problem of extracting the features from user comments of a particular product, taking advantage of a corpus with a semistructured format: pros, cons and summary. In this paper we propose a technique to extract a set of features based on user generated pros and cons for a particular product. Then using this set we test a feature similarity function to obtain new features from reviews (both from the pros/cons and from the free text summary) of the same and other products. Our experimental results have shown interesting conclusions.
Automatic text summarisation, especially sentence extraction, has received a great deal of attention from researchers. However, a majority of the work focuses on newswire summarisation where the goal is to generate headlines or short... more
Automatic text summarisation, especially sentence extraction, has received a great deal of attention from researchers. However, a majority of the work focuses on newswire summarisation where the goal is to generate headlines or short summaries from a single news article or a cluster of related news articles. One primary reason for this is the fact that most public datasets related to text summarisation consist of newswire articles. Whether it is the traditional Document Understanding Conference (DUC) or Text Analysis Conference (TAC) datasets or the recent CNN/Daily mail corpus, the focus is mainly on newswire articles. In reality, this forms a rather small part of the numerous possible applications of text summarisation. The focus is now shifting towards other areas like product-review summarisation, domain-specific summarisation and real-time summarisation. Each of these areas have their own sets of challenges, but they have one issue in common, i.e. availability of large-scale co...
Question answering for Machine reading evaluation track is a aim to check machine understanding ability of a machine.As we analyzed most crusial part for efficient working of this system is to select text which needs to be considered for... more
Question answering for Machine reading evaluation track is a aim to check machine understanding ability of a machine.As we analyzed most crusial part for efficient working of this system is to select text which needs to be considered for understanding since understanding text would involve a lot of NLP processing. This paper covers our submitted system for QA4MRE campaign, Which mostly focuses on two part first being selecting text from comprehension and background knowledge needed to be understand and second being eliminating or ranking options based on selected text from former step.Our main focus was on eliminating and ranking which boils down to tunning various parameter for selection whether to answer particular question if answered how to consider scores,Following methods like calculating cosine between question and passage sentences,cosine of named entities output of passage sentences and question were also considered for scoring .In addition to this basic frame work of our s...
This paper describes the participation of team DA-LD-Hildesheim from the Information Retrieval Lab(IRLAB) at DA-IICT Gandhinagar, India in collaboration with the University of Hildesheim, Germany and LDRP-ITR, Gandhinagar, India in a... more
This paper describes the participation of team DA-LD-Hildesheim from the Information Retrieval Lab(IRLAB) at DA-IICT Gandhinagar, India in collaboration with the University of Hildesheim, Germany and LDRP-ITR, Gandhinagar, India in a shared task on Aggression Identification workshop in COLING 2018. The objective of the shared task is to identify the level of aggression from the User-Generated contents within Social media written in English, Devnagiri Hindi and Romanized Hindi. Aggression levels are categorized into three predefined classes namely: ‘Overtly Aggressive‘, ‘Covertly Aggressive‘ and ‘Non-aggressive‘. The participating teams are required to develop a multi-class classifier which classifies User-generated content into these pre-defined classes. Instead of relying on a bag-of-words model, we have used pre-trained vectors for word embedding. We have performed experiments with standard machine learning classifiers. In addition, we have developed various deep learning models f...
An overview of the Evaluation subsystem of the CLIA project is given in this paper. We start with a review of standard practices in Information Retrieval (IR) evaluation. The process followed in Phase I of the project is described along... more
An overview of the Evaluation subsystem of the CLIA project is given in this paper. We start with a review of standard practices in Information Retrieval (IR) evaluation. The process followed in Phase I of the project is described along with the achievements. The evaluation strategy to be followed in Phase II is outlined next. We conclude by discussing some of the challenging issues in IR Evaluation. KeywordsCLIA project; Cranfi eld paradigm; evaluation; FIRE; metrics;
The goal of FIRE 2020 EDNIL track was to create a framework which could be used to detect events from news articles in English, Hindi, Bengali, Marathi and Tamil. The track consisted of two tasks: (i) Identifying a piece of text from news... more
The goal of FIRE 2020 EDNIL track was to create a framework which could be used to detect events from news articles in English, Hindi, Bengali, Marathi and Tamil. The track consisted of two tasks: (i) Identifying a piece of text from news articles that contains an event (Event Identification). (ii) Creating an event frame from the news article (Event Frame Extraction). The events that were identified in Event Identification task were Man-made Disaster and Natural Disaster. In Event Frame Extraction task the event frame consists of Event type, Casualties, Time, Place, Reason.
Cross-language information retrieval is difficult for languages with few processing tools or resources such as Urdu. An easy way of translating content words is provided by Google Translate, but due to lexicon limitations named entities... more
Cross-language information retrieval is difficult for languages with few processing tools or resources such as Urdu. An easy way of translating content words is provided by Google Translate, but due to lexicon limitations named entities (NEs) are transliterated letter by letter. The resulting NEs errors (zynydyny zdn for Zinedine Zidane) hurts retrieval. We propose to replace English non-words in the translation output. First, we determine phonetically similar English words with the Soundex algorithm. Then, we choose among them by a modified Levenshtein distance that models correct transliteration patterns. This strategy yields an improvement of 4% MAP (from 41.2 to 45.1, monolingual 51.4) on the FIRE-2010 dataset.
We tried Named Entity features of source documents to identify its suspicious counter part. A three stage identification method was adopted to understand the impact of NEs in plagiarism. Results along with a brief analysis are given in... more
We tried Named Entity features of source documents to identify its suspicious counter part. A three stage identification method was adopted to understand the impact of NEs in plagiarism. Results along with a brief analysis are given in this note.
The peer-reviewed conference track at FIRE is running for the second time this year. Its scope has significant overlap with that of ACM SIGIR and ACM CHIIR. We believe the quality of submissions has improved, the credit for which goes to... more
The peer-reviewed conference track at FIRE is running for the second time this year. Its scope has significant overlap with that of ACM SIGIR and ACM CHIIR. We believe the quality of submissions has improved, the credit for which goes to our authors.
With the increasingly widespread use of computers & the Internet in India, large amounts of information in Indian languages are becoming available on the web. Automatic information processing and retrieval is therefore becoming an urgent... more
With the increasingly widespread use of computers & the Internet in India, large amounts of information in Indian languages are becoming available on the web. Automatic information processing and retrieval is therefore becoming an urgent need in the Indian context. Moreover, since India is a multilingual country, any effective approach to IR in the Indian context needs to be capable of handling a multilingual collection of documents. In this paper, we discuss the N-gram approach to developing some basic tools in the area of IR and NLP. This approach is statistical and language independent in nature, and therefore eminently suited to the multilingual Indian context. We first present a brief survey of some language-processing applications in which N-grams have been successfully used. We also present the results of some preliminary experiments on using N-grams for identifying the language of an Indian language document, based on a method proposed by Cavnar et al [1].
Big data is a term for massive data sets having large, more varied and complex structure with the difficulties of storing, analyzing and visualizing for further processes or results. The process of research into massive amounts of data to... more
Big data is a term for massive data sets having large, more varied and complex structure with the difficulties of storing, analyzing and visualizing for further processes or results. The process of research into massive amounts of data to reveal hidden patterns and secret correlations named as big data analytics. These useful informations for companies or organizations with the help of gaining richer and deeper insights and getting an advantage over the competition. For this reason, big data implementations need to be analyzed and executed as accurately as possible. This paper presents an overview of big data’s content, scope, samples, methods, advantages and challenges and discusses privacy concern on it.
This paper presents the participation of Information Retrieval Lab(IRLAB) at DAIICT Gandhinagar ,India in Data challenge track of SMERP 2017. This year SMERP Data challenge track has offered a task called Text Extraction on the Italy... more
This paper presents the participation of Information Retrieval Lab(IRLAB) at DAIICT Gandhinagar ,India in Data challenge track of SMERP 2017. This year SMERP Data challenge track has offered a task called Text Extraction on the Italy earthquake tweet dataset, with an objective to retrieve relevant tweets with high recall and high precision. In this task, three runs were submitted by us and we describe the different approaches adopted. Initially, we have performed query expansion on the topics using Wordnet. In the first run, we have ranked tweets using cosine similarity against the topics. In the second run, relevance score between tweets and the topic is calculated using Okapi BM25 ranking function and in the third run relevance score is calculated using language model with Jelinek-Mercer smoothing .
In this work, we present a weakly supervised sentence extraction technique for identifying important sentences in scientific papers that are worthy of inclusion in the abstract. We propose a new attention based deep learning architecture... more
In this work, we present a weakly supervised sentence extraction technique for identifying important sentences in scientific papers that are worthy of inclusion in the abstract. We propose a new attention based deep learning architecture that jointly learns to identify important content, as well as the cue phrases that are indicative of summary worthy sentences. We propose a new context embedding technique for determining the focus of a given paper using topic models and use it jointly with an LSTM based sequence encoder to learn attention weights across the sentence words. We use a collection of articles publicly available through ACL anthology for our experiments. Our system achieves a performance that is better, in terms of several ROUGE metrics, as compared to several state of art extractive techniques. It also generates more coherent summaries and preserves the overall structure of the document.
Indian Statistical Institute, Kolkata participated in TREC for the rst time this year. We participated in TREC Legal Interactive task in two topics namely, Topic 301 and Topic 302. We reduced the size of the corpus by Boolean retrieval... more
Indian Statistical Institute, Kolkata participated in TREC for the rst time this year. We participated in TREC Legal Interactive task in two topics namely, Topic 301 and Topic 302. We reduced the size of the corpus by Boolean retrieval using Lemur 4.11 1 and followed it by a clustering technique. We chose members from each cluster (which we called seeds) for relevance judgement by the TA and assumed all other members of the cluster whose seeds are assessed as relevant to be relevant.
This paper present the participation of Information Retrieval Lab(IR LAB DA-IICT Gandhinagar) in FIRE 2016 Microblog Track. The main objective of the track is to identify Information Retrieval methodologies to retrieve important... more
This paper present the participation of Information Retrieval Lab(IR LAB DA-IICT Gandhinagar) in FIRE 2016 Microblog Track. The main objective of the track is to identify Information Retrieval methodologies to retrieve important information from Twitter posted during the disasters. We have submitted two runs for this track. In the first run, daiict irlab 1, we have expanded topic term using Word2vec model trained by the tweet corpus provided by the organizer. Relevance score between tweet and corpus are calculated by Okapi BM25 model. Precision@20 ,primary metric, for this run, is 0.3143. In the second run,daiict irlab 2, we have set different weight for original term and expanded topic term, we achieve precision@20 around 0.30.
This paper discusses the QA system submitted by Dhirubhai Ambani Institute of Information and Communication Technology, India in the ResPubliQA 2010. We have participated in the monolingual en-en task. Our system retrieves a candidate... more
This paper discusses the QA system submitted by Dhirubhai Ambani Institute of Information and Communication Technology, India in the ResPubliQA 2010. We have participated in the monolingual en-en task. Our system retrieves a candidate paragraph that contains the answer to a natural language question. Depending on the n-gram similarity score of the candidate paragraph, a decision is made whether to answer the question or not. The objective of our participation is to test our implementation of various strategies like Query Expansion, n-gram similarity matching, and non-answering criteria.
The Information Retrieval Lab at DA-IICT India participated in text summarization of the Data Challenge track of SMERP 2017. SMERP 2017 track organizers have provided the Italy earthquake tweet dataset along with the set of topics which... more
The Information Retrieval Lab at DA-IICT India participated in text summarization of the Data Challenge track of SMERP 2017. SMERP 2017 track organizers have provided the Italy earthquake tweet dataset along with the set of topics which describe important information required during any disaster related incident. The main goal of this task is to gather how well the participant’s system summarizes important tweets which are relevant to a given topic in 300 words. We have anticipated Text summarization as a clustering problem. Our approach is based on extractive summarization. We have submitted runs in both the levels with different methodologies. We have done query expansion on the topics using Wordnet. In the first level, we have calculated the cosine similarity score between tweets and expanded query. In the second level, we have used language model with Jelinek-Mercer smoothing to calculate relevance score between tweets and expanded query. We have selected tweets above a relevanc...
We examine several quantitative techniques of authorship attribution that have gained importance over the time and compare them with the current state of the art Z-score based technique. In this paper we show how comparable the existing... more
We examine several quantitative techniques of authorship attribution that have gained importance over the time and compare them with the current state of the art Z-score based technique. In this paper we show how comparable the existing techniques can be to the Z-score based method, simply by tuning the parameters. We try to find the optimum values for number of terms, smoothing parameter value and the minimum number of texts required for creating an author profile.
This paper attempt to study the effectiveness of text representation schemes on two tasks namely: User Aggression and Fact Detection from the social media contents. In User Aggression detection, The aim is to identify the level of... more
This paper attempt to study the effectiveness of text representation schemes on two tasks namely: User Aggression and Fact Detection from the social media contents. In User Aggression detection, The aim is to identify the level of aggression from the contents generated in the Social media and written in the English, Devanagari Hindi and Romanized Hindi. Aggression levels are categorized into three predefined classes namely: `Non-aggressive`, `Overtly Aggressive`, and `Covertly Aggressive`. During the disaster-related incident, Social media like, Twitter is flooded with millions of posts. In such emergency situations, identification of factual posts is important for organizations involved in the relief operation. We anticipated this problem as a combination of classification and Ranking problem. This paper presents a comparison of various text representation scheme based on BoW techniques, distributed word/sentence representation, transfer learning on classifiers. Weighted $F_1$ scor...
With an ever growing number of extractive summarization techniques being proposed, there is less clarity then ever about how good each system is compared to the rest. Several studies highlight the variance in performance of these systems... more
With an ever growing number of extractive summarization techniques being proposed, there is less clarity then ever about how good each system is compared to the rest. Several studies highlight the variance in performance of these systems with change in datasets or even across documents within the same corpus. An effective way to counter this variance and to make the systems more robust could be to use inputs from multiple systems when generating a summary. In the present work, we define a novel way of creating such ensemble by exploiting similarity between the content of candidate summaries to estimate their reliability. We define GlobalRank which captures the performance of a candidate system on an overall corpus and LocalRank which estimates its performance on a given document cluster. We then use these two scores to assign a weight to each individual systems, which is then used to generate the new aggregate ranking. Experiments on DUC2003 and DUC 2004 datasets show a significant ...
This paper presents the participation of the IRNLPDAIICT team from Information Retrieval and Natural Language Processing lab at DA-IICT, India in DravidianLangTech-EACL2021 Offensive Language identification in Dravidian Languages. The aim... more
This paper presents the participation of the IRNLPDAIICT team from Information Retrieval and Natural Language Processing lab at DA-IICT, India in DravidianLangTech-EACL2021 Offensive Language identification in Dravidian Languages. The aim of this shared task is to identify Offensive Language from a code-mixed data-set of YouTube comments. The task is to classify comments into Not Offensive (NO), Offensive Untargetede(OU), Offensive Targeted Individual (OTI), Offensive Targeted Group (OTG), Offensive Targeted Others (OTO), Other Language (OL) for three Dravidian languages: Kannada, Malayalam and Tamil. We use TF-IDF character n-grams and pretrained MuRIL embeddings for text representation and Logistic Regression and Linear SVM for classification. Our best approach achieved Ninth, Third and Eighth with weighted F1 score of 0.64, 0.95 and 0.71in Kannada, Malayalam and Tamil on test dataset respectively.
This paper describes our team’s submission of the EACL DravidianLangTech-2021’s shared task on Machine Translation of Dravidian languages.We submitted our translations for English to Malayalam , Tamil , Telugu and also Tamil-Telugu... more
This paper describes our team’s submission of the EACL DravidianLangTech-2021’s shared task on Machine Translation of Dravidian languages.We submitted our translations for English to Malayalam , Tamil , Telugu and also Tamil-Telugu language pairs. The submissions mainly focus on having adequate amount of data backed up by good preprocessing of it to produce quality translations,which includes some custom made rules to remove unnecessary sentences. We conducted several experiments on these models by tweaking the architecture,Byte Pair Encoding (BPE) and other hyperparameters.
This paper presents the participation of the IRNLP_DAIICT team from Information Retrieval and Natural Language Processing lab at DA-IICT, India in LT-EDI@EACL2021 Hope Speech Detection task. The aim of this shared task is to identify hope... more
This paper presents the participation of the IRNLP_DAIICT team from Information Retrieval and Natural Language Processing lab at DA-IICT, India in LT-EDI@EACL2021 Hope Speech Detection task. The aim of this shared task is to identify hope speech from a code-mixed data-set of YouTube comments. The task is to classify comments into Hope Speech, Non Hope speech or Not in language, for three languages: English, Malayalam-English and Tamil-English. We use TF-IDF character n-grams and pretrained MuRIL embeddings for text representation and Logistic Regression and Linear SVM for classification. Our best approach achieved second, eighth and fifth rank with weighted F1 score of 0.92, 0.75 and 0.57 in English, Malayalam-English and Tamil-English on test dataset respectively
Ontology Learning has been the subject of intensive study for the past decade. Researchers in this field have been motivated by the possibility of automatically building a knowledge base on top of text documents so as to support reasoning... more
Ontology Learning has been the subject of intensive study for the past decade. Researchers in this field have been motivated by the possibility of automatically building a knowledge base on top of text documents so as to support reasoning based knowledge extraction. While most works in this field have been primarily statistical (known as light-weight Ontology Learning) not much attempt has been made in axiomatic Ontology Learning (called heavy-weight Ontology Learning) from Natural Language text documents. Heavy-weight Ontology Learning supports more precise formal logic-based reasoning when compared to statistical ontology learning. In this paper we have proposed a sound Ontology Learning tool DLOL_(IS-A) that maps English language IS-A sentences into their equivalent Description Logic (DL) expressions in order to automatically generate a consistent pair of T-box and A-box thereby forming both regular (definitional form) and generalized (axiomatic form) DL ontology. The current sco...
Retrieving relevant information from biomedical text data is a new challenging area of research. Thousands of articles are being added into biomedical literature each year and this large collection of publications offer an excellent... more
Retrieving relevant information from biomedical text data is a new challenging area of research. Thousands of articles are being added into biomedical literature each year and this large collection of publications offer an excellent opportunity for discovering hidden biomedical knowledge by applying information retrieval (IR) and Natural Language Processing (NLP) technologies. Biomedical Text processing is different from others. It requires special kind of processing as it has complex medical terminologies. Medical entity identification and normalization itself is a research problem. Relationships among medical entities have the impact on any system. The Clinical Decision Support systems are aimed to provide assistance to the decision-making tasks in biomedical domain. The medical knowledge have the potential to impact considerably on the quality of care provided by clinicians. Medical field has various types of queries: short questions, medical case reports, medical case narratives...
A substantial amount of research has been carried out in developing machine learning algorithms that account for term dependence in text classification. These algorithms offer acceptable performance in most cases but they are associated... more
A substantial amount of research has been carried out in developing machine learning algorithms that account for term dependence in text classification. These algorithms offer acceptable performance in most cases but they are associated with a substantial cost. They require significantly greater resources to operate. This paper argues against the justification of the higher costs of these algorithms, based on their performance in text classification problems. In order to prove the conjecture, the performance of one of the best dependence models is compared to several well established algorithms in text classification. A very specific collection of datasets have been designed, which would best reflect the disparity in the nature of text data, that are present in real world applications. The results show that even one of the best term dependence models, performs decent at best when compared to other independence models. Coupled with their substantially greater requirement for hardware...
Abstract A large number of extractive summarization techniques have been developed in the past decade, but very few enquiries have been made as to how these differ from each other or what are the factors that actually affect these... more
Abstract A large number of extractive summarization techniques have been developed in the past decade, but very few enquiries have been made as to how these differ from each other or what are the factors that actually affect these systems. Such meaningful comparison if available can be used to create a robust ensemble of these approaches, which has the possibility to consistently outperform each individual summarization system. In this work we examine the roles of three principle components of an extractive summarization technique: sentence ranking algorithm, sentence similarity metric and text representation scheme. We show that using a combination of several different sentence similarity measures, rather than only one, significantly improves performance of the resultant meta-system. Even simple ensemble techniques, when used in an informed manner, prove to be very effective in improving the overall performance and consistency of summarization systems. A statistically significant improvement of about 5% to 10% in ROUGE-1 recall was achieved by aggregating various sentence similarity measures. As opposed to this aggregation of several ranking algorithms did not show a significant improvement in ROUGE score, but even in this case the resultant meta-systems were more robust than candidate systems. The results suggest that new extractive summarization techniques should particularly focus on defining a better sentence similarity metric and use multiple sentence similarity scores and ranking algorithms in favour of a particular combination.
ABSTRACT In this paper we present a thorough quantitative analysis of large scale media text of three Indo-aryan languages viz. Hindi, Gujarati and Bengali. Population wise they together amount to 600 million speakers. Understanding and... more
ABSTRACT In this paper we present a thorough quantitative analysis of large scale media text of three Indo-aryan languages viz. Hindi, Gujarati and Bengali. Population wise they together amount to 600 million speakers. Understanding and processing media text is very important from sociological, Cultural and Information science/theoretic stand points. We did a detailed study to understand the statistical nature of these data. The study demonstrates effect of size and category of media text on term distributions. We establish that while higher order n-grams tend to follow zipf's law, the same is not always true for unigrams. We attempt to model the change in term distribution in two separate parts: effect on steepness of the term distribution and that on the tail of the term distribution. To the best of our knowledge this is the first exploratory study of these three languages on such a large scale.
The proposed algorithm uses context information to segregate semantically related error variants from the unrelated ones.String similarity measures are used to join error variants with the correct ...
Temporal annotation of plain text is considered a useful component of modern information retrieval tasks. In this work, different approaches for identification and classification of temporal expressions in Hindi are developed and... more
Temporal annotation of plain text is considered a useful component of modern information retrieval tasks. In this work, different approaches for identification and classification of temporal expressions in Hindi are developed and analyzed. First, a rule-based approach is developed, which takes plain text as input and based on a set of hand-crafted rules, produces a tagged output with identified temporal expressions. This approach performs with a strict F1-measure of 0.83. In another approach, a CRF-based classifier is trained with human tagged data and is then tested on a test dataset. The trained classifier identifies the time expressions from plain text and further classifies them to various classes. This approach performs with a strict F1-measure of 0.78. Next, the CRF is replaced by an SVM-based classifier and the same experiment is performed with the same features. This approach is shown to be comparable to the CRF and performs with a strict F1-measure of 0.77. Using the rule b...
Temporal annotation of plain text is considered as a useful component of modern information retrieval tasks. In this work, two approaches for identification and classification of temporal entities in Hindi are developed and analyzed.... more
Temporal annotation of plain text is considered as a useful component of modern information retrieval tasks. In this work, two approaches for identification and classification of temporal entities in Hindi are developed and analyzed. Firstly, a rule based approach is developed, which takes plain text as input and based on a set of hand-crafted rules, produces a tagged output with identified temporal expressions. This approach is shown to have a strict F1-measure of 0.83. In the other approach, a CRF based classifier is trained with human tagged data and is then tested on a test dataset. The trained classifier identifies the temporal expressions from plain text and further classifies them to various classes. This approach is shown to have a strict F1-measure of 0.78. In this process a reusable gold standard dataset for temporal tagging in Hindi was developed. Named the ILTIMEX2012 corpus, it consists of 300 manually tagged Hindi news documents.
In this paper, we survey various user-centered or context-based biomedical health information retrieval systems. We present and discuss the performance of systems submitted in CLEF eHealth 2014 Task 3 for this purpose. We classify and... more
In this paper, we survey various user-centered or context-based biomedical health information retrieval systems. We present and discuss the performance of systems submitted in CLEF eHealth 2014 Task 3 for this purpose. We classify and focus on comparing the two most prevalent retrieval models in biomedical information retrieval namely: Language Model (LM) and Vector Space Model (VSM). We also report on the effectiveness of using external medical resources and ontologies like MeSH, Metamap, UMLS, etc. We observed that the L.M. based retrieval systems outperform VSM based systems on various fronts. From the results we conclude that the state-of-art system scores for MAP was 0.4146, P@10 was 0.7560 and NDCG@10 was 0.7445, respectively. All of these score were reported by systems built on language modelling approaches.
For our participation in CDS task of TREC, our �rst objec- tive was to obtain e�cient biomedical document retrieval. We focused on fusing manual and machine feedback runs. Fusion run performs better and gives consistent results for... more
For our participation in CDS task of TREC, our �rst objec- tive was to obtain e�cient biomedical document retrieval. We focused on fusing manual and machine feedback runs. Fusion run performs better and gives consistent results for considered evaluation metrics. Also, the categories 'diagnosis' and 'treatment' are giving good results compared to 'test'.

And 49 more