Abstract
Motivation
The promises of the post-genome era disease-related discoveries and advances have yet to be fully realized, with many opportunities for discovery hiding in the millions of biomedical papers published since. Public databases give access to data extracted from the literature by teams of experts, but their coverage is often limited and lags behind recent discoveries. We present a computational method that combines data extracted from the literature with data from curated sources in order to uncover possible gene-disease relationships that are not directly stated or were missed by the initial mining.Method
An initial set of genes and proteins is obtained from gene-disease relationships extracted from PubMed abstracts using natural language processing. Interactions involving the corresponding proteins are similarly extracted and integrated with interactions from curated databases (such as BIND and DIP), assigning a confidence measure to each interaction depending on its source. The augmented list of genes and gene products is then ranked combining two scores: one that reflects the strength of the relationship with the initial set of genes and incorporates user-defined weights and another that reflects the importance of the gene in maintaining the connectivity of the network. We applied the method to atherosclerosis to assess its effectiveness.Results
Top-ranked proteins from the method are related to atherosclerosis with accuracy between 0.85 to 1.00 for the top 20 and 0.64 to 0.80 for the top 90 if duplicates are ignored, with 45% of the top 20 and 75% of the top 90 derived by the method, not extracted from text. Thus, though the initial gene set and interactions were automatically extracted from text (and subject to the impreciseness of automatic extraction), their use for further hypothesis generation is valuable given adequate computational analysis.Full text links
Free to read at psb.stanford.edu
http://helix-web.stanford.edu/psb07/abstracts/p28.html
Citations & impact
Impact metrics
Citations of article over time
Article citations
Topology-driven protein-protein interaction network analysis detects genetic sub-networks regulating reproductive capacity.
Elife, 9:e54082, 09 Sep 2020
Cited by: 7 articles | PMID: 32901612 | PMCID: PMC7550192
An algorithm for network-based gene prioritization that encodes knowledge both in nodes and in links.
PLoS One, 8(11):e79564, 19 Nov 2013
Cited by: 6 articles | PMID: 24260251 | PMCID: PMC3834271
Finding novel molecular connections between developmental processes and disease.
PLoS Comput Biol, 10(5):e1003578, 29 May 2014
Cited by: 8 articles | PMID: 24874013 | PMCID: PMC4038461
Automatic gene prioritization in support of the inflammatory contribution to Alzheimer's disease.
AMIA Jt Summits Transl Sci Proc, 2014:42-47, 07 Apr 2014
Cited by: 4 articles | PMID: 25717399 | PMCID: PMC4333706
A systems biology approach to the global analysis of transcription factors in colorectal cancer.
BMC Cancer, 12:331, 01 Aug 2012
Cited by: 16 articles | PMID: 22852817 | PMCID: PMC3539921
Go to all (14) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Mining Alzheimer disease relevant proteins from integrated protein interactome data.
Pac Symp Biocomput, 367-378, 01 Jan 2006
Cited by: 43 articles | PMID: 17094253
PRIME: automatically extracted PRotein Interactions and Molecular Information databasE.
In Silico Biol, 5(1):9-20, 01 Jan 2005
Cited by: 4 articles | PMID: 15972002
Building a protein name dictionary from full text: a machine learning term extraction approach.
BMC Bioinformatics, 6:88, 07 Apr 2005
Cited by: 6 articles | PMID: 15817129 | PMCID: PMC1090555
Automatic mining of the literature to generate new hypotheses for the possible link between periodontitis and atherosclerosis: lipopolysaccharide as a case study.
J Clin Periodontol, 34(12):1016-1024, 01 Dec 2007
Cited by: 26 articles | PMID: 18028194
Review
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?
Brief Bioinform, 9(6):466-478, 01 Nov 2008
Cited by: 44 articles | PMID: 19060303
Review