Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleApril 2020
Predicting Drug Demand with Wikipedia Views: Evidence from Darknet Markets.
WWW '20: Proceedings of The Web Conference 2020April 2020, Pages 2669–2675https://doi.org/10.1145/3366423.3380022Rapid changes in illicit drug demand, such as the Fentanyl epidemic, are a major public health issue. Policymakers currently rely on annual surveys to monitor public consumption, which are arguably too infrequent to detect rapid shifts in drug use. We ...
- research-articleJune 2019
Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataJune 2019, Pages 229–246https://doi.org/10.1145/3299869.3319899Data enrichment is the act of extending a local database with new attributes from external data sources. In this paper, we study a novel problem-how to progressively crawl the deep web (i.e., a hidden database) through a keyword-search API to enrich a ...
- research-articleMay 2018
Deeper: A Data Enrichment System Powered by Deep Web
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataMay 2018, Pages 1801–1804https://doi.org/10.1145/3183713.3193569Data scientists often spend more than 80% of their time on data preparation. Data enrichment, the act of extending a local database with new attributes from external data sources, is among the most time-consuming tasks. Existing data enrichment works ...
- research-articleApril 2018
Browserless Web Data Extraction: Challenges and Opportunities
WWW '18: Proceedings of the 2018 World Wide Web ConferenceApril 2018, Pages 1095–1104https://doi.org/10.1145/3178876.3186008Most modern web scrapers use an embedded browser to render web pages and to simulate user actions. Such scrapers (or wrappers) are therefore expensive to execute, in terms of time and network traffic. In contrast, it is magnitudes more resource-...
- posterOctober 2017
POSTER: Probing Tor Hidden Service with Dockers
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2017, Pages 2571–2573https://doi.org/10.1145/3133956.3138849Tor is a commonly used anonymous network that provides the hidden services. As the number of hidden services using Tor's anonymous network has been steadily increasing every year, so does the number of services that abuse Tor's anonymity. The existing ...
-
- research-articleJune 2017
Querying deep web data sources as linked data
WIMS '17: Proceedings of the 7th International Conference on Web Intelligence, Mining and SemanticsJune 2017, Article No.: 32, Pages 1–7https://doi.org/10.1145/3102254.3102290The Deep Web is constituted by dynamically generated pages, usually requested through HTML forms; it is notoriously difficult to query and to search, as its pages are obviously non-indexable. Recently, Deep Web data have been made accessible through ...
- invited-talkJune 2017
Querying and searching the deep web
WIMS '17: Proceedings of the 7th International Conference on Web Intelligence, Mining and SemanticsJune 2017, Article No.: 3, Page 1https://doi.org/10.1145/3102254.3102257The term Deep Web (sometimes also called Hidden Web) [2, 5, 8] refers to the data content that is accessible through Web pages, typically via HTML forms, but is not available on static pages for indexing by search engines. Deep Web data reside in ...
- research-articleOctober 2016
Anonymity of Tor: Myth and Reality
CEE-SECR '16: Proceedings of the 12th Central and Eastern European Software Engineering Conference in RussiaOctober 2016, Article No.: 10, Pages 1–5https://doi.org/10.1145/3022211.3022221Privacy enhancing technologies (PETs) are ubiquitous nowadays. They are beneficial for a wide range of users. However, PETs are not always used for legal activity. The present paper is focused on Tor users deanonimization1 using out-of-the box ...
- demonstrationApril 2016
FuhSen: A Platform for Federated, RDF-based Hybrid Search
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebApril 2016, Pages 171–174https://doi.org/10.1145/2872518.2890535The increasing amount of structured and semi-structured information available on the Web and in distributed information systems, as well as the Web's diversification into different segments such as the Social Web, the Deep Web, or the Dark Web, requires ...
- research-articleDecember 2015
Towards complete coverage in focused web harvesting
iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & ServicesDecember 2015, Article No.: 65, Pages 1–9https://doi.org/10.1145/2837185.2837208With the goal of harvesting all information about a given entity, in this paper, we try to harvest all matching documents for a given query submitted on a search engine. The objective is to retrieve all information about for instance "Michael Jackson", "...
- research-articleOctober 2015
Ranking Deep Web Text Collections for Scalable Information Extraction
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementOctober 2015, Pages 153–162https://doi.org/10.1145/2806416.2806581Information extraction (IE) systems discover structured information from natural language text, to enable much richer querying and data mining than possible directly over the unstructured text. Unfortunately, IE is generally a computationally expensive ...
- research-articleMay 2015
DataXFormer: An Interactive Data Transformation Tool
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataMay 2015, Pages 883–888https://doi.org/10.1145/2723372.2735366While syntactic transformations require the application of a formula on the input values, such as unit conversion or date format conversions, semantic transformations, such as "zip code to city", require a look-up in some reference data. We recently ...
- research-articleOctober 2014
DIADEM: thousands of websites to a single database
Proceedings of the VLDB Endowment (PVLDB), Volume 7, Issue 14Pages 1845–1856https://doi.org/10.14778/2733085.2733091The web is overflowing with implicitly structured data, spread over hundreds of thousands of sites, hidden deep behind search forms, or siloed in marketplaces, only accessible as HTML. Automatic extraction of structured data at the scale of thousands of ...
- tutorialFebruary 2014
Exploration and mining of web repositories
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningFebruary 2014, Pages 675–676https://doi.org/10.1145/2556195.2556197With the proliferation of very large data repositories hidden behind web interfaces, e.g., keyword search, form-like search and hierarchical/graph-based browsing interfaces for Amazon.com, eBay.com, etc., efficient ways of searching, exploring and/or ...
- research-articleNovember 2013
The deep web: woven to catch the middle ground
Web-KR '13: Proceedings of the 4th international workshop on Web-scale knowledge representation retrieval and reasoningNovember 2013, Pages 5–8https://doi.org/10.1145/2512405.2512408The massive and diverse data sources on the Deep Web presents a serious data integration challenge. Existing virtual integration approaches suffer from slow query response, while surfacing approaches demand hefty storage space and incur huge costs in ...
- posterMay 2013
Searching the deep web using proactive phrase queries
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebMay 2013, Pages 137–138https://doi.org/10.1145/2487788.2487854This paper proposes ipq, a novel search engine that proactively transforms query forms of Deep Web sources into phrase queries, constructs query evaluation plans, and caches results for popular queries offline. Then at query time, keyword queries are ...
- research-articleDecember 2012
Size estimation of non-cooperative data collections
IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & ServicesDecember 2012, Pages 239–246https://doi.org/10.1145/2428736.2428774With the increasing amount of data in deep web sources (hidden from general search engines behind web forms), accessing this data has gained more attention. In the algorithms applied for this purpose, it is the knowledge of a data source size that ...
- research-articleAugust 2012
Stratified k-means clustering over a deep web data source
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2012, Pages 1113–1121https://doi.org/10.1145/2339530.2339705This paper focuses on the problem of clustering data from a {\em hidden} or a deep web data source. A key characteristic of deep web data sources is that data can only be accessed through the limited query interface they support. Because the underlying ...
- demonstrationApril 2012
OPAL: a passe-partout for web forms
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebApril 2012, Pages 353–356https://doi.org/10.1145/2187980.2188047Web forms are the interfaces of the deep web. Though modern web browsers provide facilities to assist in form filling, this assistance is limited to prior form fillings or keyword matching. Automatic form understanding enables a broad range of ...