Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content
Sudeshna Sarkar

    Sudeshna Sarkar

    In Cross-Language Information Retrieval, finding the appropriate translation of the source language query has always been a difficult problem to solve. We propose a technique towards solving this problem with the help of multilingual word... more
    In Cross-Language Information Retrieval, finding the appropriate translation of the source language query has always been a difficult problem to solve. We propose a technique towards solving this problem with the help of multilingual word clusters obtained from multilingual word embeddings. We use word embeddings of the languages projected to a common vector space on which a community-detection algorithm is applied to find clusters such that words that represent the same concept from different languages fall in the same group. We utilize these multilingual word clusters to perform query translation for Cross-Language Information Retrieval for three languages - English, Hindi and Bengali. We have experimented with the FIRE 2012 and Wikipedia datasets and have shown improvements over several standard methods like dictionary-based method, a transliteration-based model and Google Translate.
    Event Argument extraction refers to the task of extracting structured information from unstructured text for a particular event of interest. The existing works exhibit poor capabilities to extract causal event arguments like Reason and... more
    Event Argument extraction refers to the task of extracting structured information from unstructured text for a particular event of interest. The existing works exhibit poor capabilities to extract causal event arguments like Reason and After Effects. Furthermore, most of the existing works model this task at a sentence level, restricting the context to a local scope. While it may be effective for short spans of text, for longer bodies of text such as news articles, it has often been observed that the arguments for an event do not necessarily occur in the same sentence as that containing an event trigger. To tackle the issue of argument scattering across sentences, the use of global context becomes imperative in this task. In our work, we propose an external knowledge aided approach to infuse document level event information to aid the extraction of complex event arguments. We develop a causal network for our event-annotated dataset by extracting relevant event causal structures from...
    In this paper we describe the IIT KGP team’s participation in the Event Extraction task at FIRE 2017. We have developed an event extraction system which can extract event-phrases from tweets written in Indian language scripts along with... more
    In this paper we describe the IIT KGP team’s participation in the Event Extraction task at FIRE 2017. We have developed an event extraction system which can extract event-phrases from tweets written in Indian language scripts along with Roman script. We designed our system on Hindi language and then used the same system for Malayalam and Tamil languages. We have submitted two systems one uses pipelined architecture another uses non-pipelined architecture. In case of pipelined architecture we first identify the tweets which contain event inside it and then extract the eventphrase from those tweets. In case of non-pipelined system all the tweets are directly pass to the event extraction system. Though conceptually simple, non-pipelined approach gives better result than pipelined approach and achieves F1-score of 50.01, 48.29 and 51.80 on Hindi, Malayalam and Tamil dataset respectively.
    Companies send a lot of promotional offers and coupons to customers to attract them to buy more. Offer recommendation systems can help to identify relevant offers to users. In this paper, we present a Neural Factorization (NF) model for... more
    Companies send a lot of promotional offers and coupons to customers to attract them to buy more. Offer recommendation systems can help to identify relevant offers to users. In this paper, we present a Neural Factorization (NF) model for the task of Offer recommendation. We represent users and offers with Knowledge Graph Embeddings (KGE). Specifically, we model the available data in the form of a Knowledge Graph (KG) and learn embeddings for entities and relations using a standard KGE technique called TransE. We also incorporate the user temporal features in the NF model using Long Short Term Memory (LSTM) with attention framework. We experiment with Kaggle Acquire Valued Shoppers Challenge dataset and show that the performance of our model is significantly better than tree-based methods.
    Identification of potential Drug-Drug Interactions (DDI) for newly developed drugs is essential in public healthcare. Computational methods of DDI prediction rely on known interactions to learn possible interaction between drug pairs... more
    Identification of potential Drug-Drug Interactions (DDI) for newly developed drugs is essential in public healthcare. Computational methods of DDI prediction rely on known interactions to learn possible interaction between drug pairs whose interactions are unknown. Past work has used various similarity measures of drugs to predict DDIs. In this paper, we propose an effective approach to DDI Prediction using rich drug representations utilizing multiple knowledge sources. We have used the Drug-Target Interaction (DTI) Network to learn an embedding of drugs by using the metapath2vec algorithm. We have also used drug representation gained from the rich chemical structure representation of drugs using Variational Auto-Encoder. The DDI prediction problem is modeled as a link prediction problem in the DDI network containing known interactions. We represent the nodes in the DDI network as their embeddings. We apply a link prediction algorithm based on Graph Auto-Encoders to predict additional edges in this network, which are potential interactions. We have evaluated our approach on three benchmark DDI datasets, namely DrugBank, SemMedDB, and BioSNAP. Experimental results demonstrate that the proposed method outperforms the prior methods in terms of several performance metrics (AUC, AUPR, and F1-score) on all the datasets. Furthermore, we have also evaluated the role of the individual type of drug representation embeddings in boosting up the performance of DDI Prediction.
    In this paper we have described a neural network based approach for Event extraction(EE) task which aims to discover different types of events along with the event arguments form the text documents written in Indian languages like Hindi,... more
    In this paper we have described a neural network based approach for Event extraction(EE) task which aims to discover different types of events along with the event arguments form the text documents written in Indian languages like Hindi, Tamil and English as part of our participation in the task on Event Extraction from Newswires and Social Media Text in Indian Languages at Forum for Information Retrieval Evaluation (FIRE) in 2018. A neural netork model which is a combination of Convolution neural network(CNN) and Recurrent neural network(RNN) is employed for the Event identification task. In addition to event detection, the system also extracts the event arguments which contain the information related to the events(i.e. when[Time], where[Place], Reason, Casualty, After-effect etc.). Our proposed Event Extraction model achieves f-score of 39.71, 37.42 and 39.91 on Hindi, Tamil and English dataset respectively which shows the overall performance of Event identification and argument e...
    Precipitation nowcasting is an important component for accurate weather modeling and Doppler radar data acts as an important input for nowcasting models. In this work, we propose a deep learning based approach for radar echo states... more
    Precipitation nowcasting is an important component for accurate weather modeling and Doppler radar data acts as an important input for nowcasting models. In this work, we propose a deep learning based approach for radar echo states prediction. Our approach uses a hybrid structure of convolutions within Long Short Term Memory recurrent network structure and a discriminator network is added in the loss objective to refine the predictions acting as a regularizer. This models the spatio-temporal nature of the problem explicitly in the neural network. We show that this model can be applied for fine grained short term precipitation prediction with improvement in evaluation metrics as compared to strong baselines. The proposed model improves recall score by 11% compared to without adversarial regularization. Results are presented using usual train, test strategy for the task of echo state prediction and derived precipitation based skill scores on the data from Seattle, WA, USA.
    Most of the existing information extraction frameworks (Wadden et al., 2019; Veysehet al., 2020) focus on sentence-level tasks and are hardly able to capture the consolidated information from a given document. In our endeavour to generate... more
    Most of the existing information extraction frameworks (Wadden et al., 2019; Veysehet al., 2020) focus on sentence-level tasks and are hardly able to capture the consolidated information from a given document. In our endeavour to generate precise document-level information frames from lengthy textual records, we introduce the task of Information Aggregation or Argument Aggregation. More specifically, our aim is to filter irrelevant and redundant argument mentions that were extracted at a sentence level and render a document level information frame. Majority of the existing works have been observed to resolve related tasks of document-level event argument extraction (Yang et al., 2018; Zheng et al., 2019) and salient entity identification (Jain et al., 2020) using supervised techniques. To remove dependency from large amounts of labelled data, we explore the task of information aggregation using weakly supervised techniques. In particular, we present an extractive algorithm with mult...
    Recently, neural network architectures have outperformed traditional methods in biomedical named entity recognition. Borrowed from innovations in general text NER, these models fail to address two important problems of polysemy and usage... more
    Recently, neural network architectures have outperformed traditional methods in biomedical named entity recognition. Borrowed from innovations in general text NER, these models fail to address two important problems of polysemy and usage of acronyms across biomedical text. We hypothesize that using a fully-contextualized model that uses contextualized representations along with context dependent transition scores in CRF can alleviate this issue and help further boost the tagger’s performance. Our experiments with this architecture have shown to improve state-of-the-art F1 score on 3 widely used biomedical corpora for NER. We also perform analysis to understand the specific cases where our contextualized model is superior to a strong baseline.
    Query-expansion is an effective Relevance Feedback technique for improving performance in Information Retrieval. In general query-expansion methods select terms from the complete contents of relevant documents. One problem with this... more
    Query-expansion is an effective Relevance Feedback technique for improving performance in Information Retrieval. In general query-expansion methods select terms from the complete contents of relevant documents. One problem with this approach is that...
    Abstract We report results of stylometric differences in blogging for gender and age group variation. The results are based on two mutually independent features. The first feature is the use of slang words which is a new concept proposed... more
    Abstract We report results of stylometric differences in blogging for gender and age group variation. The results are based on two mutually independent features. The first feature is the use of slang words which is a new concept proposed by us for Stylometric study of ...

    And 112 more