Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleApril 2020
Predicting Drug Demand with Wikipedia Views: Evidence from Darknet Markets.
WWW '20: Proceedings of The Web Conference 2020April 2020, pp 2669–2675https://doi.org/10.1145/3366423.3380022Rapid changes in illicit drug demand, such as the Fentanyl epidemic, are a major public health issue. Policymakers currently rely on annual surveys to monitor public consumption, which are arguably too infrequent to detect rapid shifts in drug use. We ...
- research-articleJune 2019
Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataJune 2019, pp 229–246https://doi.org/10.1145/3299869.3319899Data enrichment is the act of extending a local database with new attributes from external data sources. In this paper, we study a novel problem-how to progressively crawl the deep web (i.e., a hidden database) through a keyword-search API to enrich a ...
- research-articleJanuary 2019
DWSpyder: a new schema extraction method for a deep web integration system
International Journal of Web Engineering and Technology (IJWET), Volume 14, Issue 22019, pp 122–150https://doi.org/10.1504/ijwet.2019.102872The deep web is a huge part of the web that is not indexed by search engines. The deep web sources are accessible only through their associated access forms. We wish to use a web integration system to access the deep web sources and all of their ...
- research-articleMay 2018
Deeper: A Data Enrichment System Powered by Deep Web
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataMay 2018, pp 1801–1804https://doi.org/10.1145/3183713.3193569Data scientists often spend more than 80% of their time on data preparation. Data enrichment, the act of extending a local database with new attributes from external data sources, is among the most time-consuming tasks. Existing data enrichment works ...
- research-articleApril 2018
Browserless Web Data Extraction: Challenges and Opportunities
WWW '18: Proceedings of the 2018 World Wide Web ConferenceApril 2018, pp 1095–1104https://doi.org/10.1145/3178876.3186008Most modern web scrapers use an embedded browser to render web pages and to simulate user actions. Such scrapers (or wrappers) are therefore expensive to execute, in terms of time and network traffic. In contrast, it is magnitudes more resource-...
-
- posterOctober 2017
POSTER: Probing Tor Hidden Service with Dockers
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2017, pp 2571–2573https://doi.org/10.1145/3133956.3138849Tor is a commonly used anonymous network that provides the hidden services. As the number of hidden services using Tor's anonymous network has been steadily increasing every year, so does the number of services that abuse Tor's anonymity. The existing ...
- research-articleJune 2017
Querying deep web data sources as linked data
WIMS '17: Proceedings of the 7th International Conference on Web Intelligence, Mining and SemanticsJune 2017, Article No.: 32, pp 1–7https://doi.org/10.1145/3102254.3102290The Deep Web is constituted by dynamically generated pages, usually requested through HTML forms; it is notoriously difficult to query and to search, as its pages are obviously non-indexable. Recently, Deep Web data have been made accessible through ...
- invited-talkJune 2017
Querying and searching the deep web
WIMS '17: Proceedings of the 7th International Conference on Web Intelligence, Mining and SemanticsJune 2017, Article No.: 3, pp 1https://doi.org/10.1145/3102254.3102257The term Deep Web (sometimes also called Hidden Web) [2, 5, 8] refers to the data content that is accessible through Web pages, typically via HTML forms, but is not available on static pages for indexing by search engines. Deep Web data reside in ...
- research-articleOctober 2016
Anonymity of Tor: Myth and Reality
CEE-SECR '16: Proceedings of the 12th Central and Eastern European Software Engineering Conference in RussiaOctober 2016, Article No.: 10, pp 1–5https://doi.org/10.1145/3022211.3022221Privacy enhancing technologies (PETs) are ubiquitous nowadays. They are beneficial for a wide range of users. However, PETs are not always used for legal activity. The present paper is focused on Tor users deanonimization1 using out-of-the box ...
- demonstrationApril 2016
FuhSen: A Platform for Federated, RDF-based Hybrid Search
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebApril 2016, pp 171–174https://doi.org/10.1145/2872518.2890535The increasing amount of structured and semi-structured information available on the Web and in distributed information systems, as well as the Web's diversification into different segments such as the Social Web, the Deep Web, or the Dark Web, requires ...
- research-articleDecember 2015
Towards complete coverage in focused web harvesting
iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & ServicesDecember 2015, Article No.: 65, pp 1–9https://doi.org/10.1145/2837185.2837208With the goal of harvesting all information about a given entity, in this paper, we try to harvest all matching documents for a given query submitted on a search engine. The objective is to retrieve all information about for instance "Michael Jackson", "...
- articleNovember 2015
A neural network based intrusion detection and user identification system for Tor networks: performance evaluation for different number of hidden units using Friedman test
Journal of Mobile Multimedia (JMM), Volume 11, Issue 3-4November 2015, pp 251–262Due to the amount of anonymity afforded to users of the Tor infrastructure, Tor has become a useful tool for malicious users. With Tor, the users are able to compromise the non-repudiation principle of computer security. Also, the potentially hackers ...
- research-articleOctober 2015
Ranking Deep Web Text Collections for Scalable Information Extraction
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementOctober 2015, pp 153–162https://doi.org/10.1145/2806416.2806581Information extraction (IE) systems discover structured information from natural language text, to enable much richer querying and data mining than possible directly over the unstructured text. Unfortunately, IE is generally a computationally expensive ...
- research-articleMay 2015
DataXFormer: An Interactive Data Transformation Tool
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataMay 2015, pp 883–888https://doi.org/10.1145/2723372.2735366While syntactic transformations require the application of a formula on the input values, such as unit conversion or date format conversions, semantic transformations, such as "zip code to city", require a look-up in some reference data. We recently ...
- research-articleOctober 2014
DIADEM: thousands of websites to a single database
Proceedings of the VLDB Endowment (PVLDB), Volume 7, Issue 14pp 1845–1856https://doi.org/10.14778/2733085.2733091The web is overflowing with implicitly structured data, spread over hundreds of thousands of sites, hidden deep behind search forms, or siloed in marketplaces, only accessible as HTML. Automatic extraction of structured data at the scale of thousands of ...
- tutorialFebruary 2014
Exploration and mining of web repositories
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningFebruary 2014, pp 675–676https://doi.org/10.1145/2556195.2556197With the proliferation of very large data repositories hidden behind web interfaces, e.g., keyword search, form-like search and hierarchical/graph-based browsing interfaces for Amazon.com, eBay.com, etc., efficient ways of searching, exploring and/or ...
- research-articleNovember 2013
The deep web: woven to catch the middle ground
Web-KR '13: Proceedings of the 4th international workshop on Web-scale knowledge representation retrieval and reasoningNovember 2013, pp 5–8https://doi.org/10.1145/2512405.2512408The massive and diverse data sources on the Deep Web presents a serious data integration challenge. Existing virtual integration approaches suffer from slow query response, while surfacing approaches demand hefty storage space and incur huge costs in ...
- ArticleJuly 2013
Automatic classification of web databases using domain-dictionaries
MLDM'13: Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern RecognitionJuly 2013, pp 340–351https://doi.org/10.1007/978-3-642-39712-7_26The identification, classification and integration of databases on the Web (also called web databases) as information sources is still a great challenge due to their constantly growing and diversification. The classification of such web databases ...
- ArticleJuly 2013
Current challenges in web crawling
ICWE'13: Proceedings of the 13th international conference on Web EngineeringJuly 2013, pp 518–521https://doi.org/10.1007/978-3-642-39200-9_49Web crawling, a process of collecting web pages in an automated manner, is the primary and ubiquitous operation used by a large number of web systems and agents starting from a simple program for website backup to a major web search engine. Due to an ...