Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2012
TitleFinder: extracting the headline of news web pages based on cosine similarity and overlap scoring similarity
WIDM '12: Proceedings of the twelfth international workshop on Web information and data managementNovember 2012, pp 65–72https://doi.org/10.1145/2389936.2389950Automatically extracting the headline of online web articles has many applications in web mining and information retrieval. In this paper, we developed a content-based and domain-and language-independent approach, TitleFinder, for unsupervised extraction ...
- research-articleNovember 2012
Web crawler middleware for search engine digital libraries: a case study for citeseerX
- Jian Wu,
- Pradeep Teregowda,
- Madian Khabsa,
- Stephen Carman,
- Douglas Jordan,
- Jose San Pedro Wandelmer,
- Xin Lu,
- Prasenjit Mitra,
- C. Lee Giles
WIDM '12: Proceedings of the twelfth international workshop on Web information and data managementNovember 2012, pp 57–64https://doi.org/10.1145/2389936.2389949Middleware is an important part of many search engine web crawling processes. We developed a middleware, the Crawl Document Importer (CDI), which selectively imports documents and the associated metadata to the digital library CiteSeerX crawl repository ...