BBC Russian

research-article

Predicting Drug Demand with Wikipedia Views: Evidence from Darknet Markets.

WWW '20: Proceedings of The Web Conference 2020April 2020, Pages 2669–2675https://doi.org/10.1145/3366423.3380022

Rapid changes in illicit drug demand, such as the Fentanyl epidemic, are a major public health issue. Policymakers currently rely on annual surveys to monitor public consumption, which are arguably too infrequent to detect rapid shifts in drug use. We ...

research-article

Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment

SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataJune 2019, Pages 229–246https://doi.org/10.1145/3299869.3319899

Data enrichment is the act of extending a local database with new attributes from external data sources. In this paper, we study a novel problem-how to progressively crawl the deep web (i.e., a hidden database) through a keyword-search API to enrich a ...

research-article

Deeper: A Data Enrichment System Powered by Deep Web

SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataMay 2018, Pages 1801–1804https://doi.org/10.1145/3183713.3193569

Data scientists often spend more than 80% of their time on data preparation. Data enrichment, the act of extending a local database with new attributes from external data sources, is among the most time-consuming tasks. Existing data enrichment works ...

research-article

Free

Browserless Web Data Extraction: Challenges and Opportunities

WWW '18: Proceedings of the 2018 World Wide Web ConferenceApril 2018, Pages 1095–1104https://doi.org/10.1145/3178876.3186008

Most modern web scrapers use an embedded browser to render web pages and to simulate user actions. Such scrapers (or wrappers) are therefore expensive to execute, in terms of time and network traffic. In contrast, it is magnitudes more resource-...

poster

POSTER: Probing Tor Hidden Service with Dockers

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2017, Pages 2571–2573https://doi.org/10.1145/3133956.3138849

Tor is a commonly used anonymous network that provides the hidden services. As the number of hidden services using Tor's anonymous network has been steadily increasing every year, so does the number of services that abuse Tor's anonymity. The existing ...

research-article

Querying deep web data sources as linked data

WIMS '17: Proceedings of the 7th International Conference on Web Intelligence, Mining and SemanticsJune 2017, Article No.: 32, Pages 1–7https://doi.org/10.1145/3102254.3102290

The Deep Web is constituted by dynamically generated pages, usually requested through HTML forms; it is notoriously difficult to query and to search, as its pages are obviously non-indexable. Recently, Deep Web data have been made accessible through ...

invited-talk

Querying and searching the deep web

Andrea Calí

WIMS '17: Proceedings of the 7th International Conference on Web Intelligence, Mining and SemanticsJune 2017, Article No.: 3, Page 1https://doi.org/10.1145/3102254.3102257

The term Deep Web (sometimes also called Hidden Web) [2, 5, 8] refers to the data content that is accessible through Web pages, typically via HTML forms, but is not available on static pages for indexing by search engines. Deep Web data reside in ...

research-article

Anonymity of Tor: Myth and Reality

CEE-SECR '16: Proceedings of the 12th Central and Eastern European Software Engineering Conference in RussiaOctober 2016, Article No.: 10, Pages 1–5https://doi.org/10.1145/3022211.3022221

Privacy enhancing technologies (PETs) are ubiquitous nowadays. They are beneficial for a wide range of users. However, PETs are not always used for legal activity. The present paper is focused on Tor users deanonimization1 using out-of-the box ...

demonstration

FuhSen: A Platform for Federated, RDF-based Hybrid Search

WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebApril 2016, Pages 171–174https://doi.org/10.1145/2872518.2890535

The increasing amount of structured and semi-structured information available on the Web and in distributed information systems, as well as the Web's diversification into different segments such as the Social Web, the Deep Web, or the Dark Web, requires ...

research-article

Towards complete coverage in focused web harvesting

iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & ServicesDecember 2015, Article No.: 65, Pages 1–9https://doi.org/10.1145/2837185.2837208

With the goal of harvesting all information about a given entity, in this paper, we try to harvest all matching documents for a given query submitted on a search engine. The objective is to retrieve all information about for instance "Michael Jackson", "...

research-article

Ranking Deep Web Text Collections for Scalable Information Extraction

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementOctober 2015, Pages 153–162https://doi.org/10.1145/2806416.2806581

Information extraction (IE) systems discover structured information from natural language text, to enable much richer querying and data mining than possible directly over the unstructured text. Unfortunately, IE is generally a computationally expensive ...

research-article

DataXFormer: An Interactive Data Transformation Tool

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataMay 2015, Pages 883–888https://doi.org/10.1145/2723372.2735366

While syntactic transformations require the application of a formula on the input values, such as unit conversion or date format conversions, semantic transformations, such as "zip code to city", require a look-up in some reference data. We recently ...

research-article

DIADEM: thousands of websites to a single database

Proceedings of the VLDB Endowment (PVLDB), Volume 7, Issue 14Pages 1845–1856https://doi.org/10.14778/2733085.2733091

The web is overflowing with implicitly structured data, spread over hundreds of thousands of sites, hidden deep behind search forms, or siloed in marketplaces, only accessible as HTML. Automatic extraction of structured data at the scale of thousands of ...

tutorial

Exploration and mining of web repositories

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningFebruary 2014, Pages 675–676https://doi.org/10.1145/2556195.2556197

With the proliferation of very large data repositories hidden behind web interfaces, e.g., keyword search, form-like search and hierarchical/graph-based browsing interfaces for Amazon.com, eBay.com, etc., efficient ways of searching, exploring and/or ...

research-article

The deep web: woven to catch the middle ground

Wensheng Wu

Web-KR '13: Proceedings of the 4th international workshop on Web-scale knowledge representation retrieval and reasoningNovember 2013, Pages 5–8https://doi.org/10.1145/2512405.2512408

The massive and diverse data sources on the Deep Web presents a serious data integration challenge. Existing virtual integration approaches suffer from slow query response, while surfacing approaches demand hefty storage space and incur huge costs in ...

abstract

Deep web entity monitoring

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebMay 2013, Pages 377–382https://doi.org/10.1145/2487788.2487946

poster

Searching the deep web using proactive phrase queries

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebMay 2013, Pages 137–138https://doi.org/10.1145/2487788.2487854

This paper proposes ipq, a novel search engine that proactively transforms query forms of Deep Web sources into phrase queries, constructs query evaluation plans, and caches results for popular queries offline. Then at query time, keyword queries are ...

research-article

Size estimation of non-cooperative data collections

IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & ServicesDecember 2012, Pages 239–246https://doi.org/10.1145/2428736.2428774

With the increasing amount of data in deep web sources (hidden from general search engines behind web forms), accessing this data has gained more attention. In the algorithms applied for this purpose, it is the knowledge of a data source size that ...

research-article

Stratified k-means clustering over a deep web data source

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2012, Pages 1113–1121https://doi.org/10.1145/2339530.2339705

This paper focuses on the problem of clustering data from a {\em hidden} or a deep web data source. A key characteristic of deep web data sources is that data can only be accessed through the limited query interface they support. Because the underlying ...

demonstration

OPAL: a passe-partout for web forms

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebApril 2012, Pages 353–356https://doi.org/10.1145/2187980.2188047

Web forms are the interfaces of the deep web. Though modern web browsers provide facilities to assist in form filling, this assistance is limited to prior form fillings or keyword matching. Automatic form understanding enables a broad range of ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Caption

Predicting Drug Demand with Wikipedia Views: Evidence from Darknet Markets.

Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment

Deeper: A Data Enrichment System Powered by Deep Web

Browserless Web Data Extraction: Challenges and Opportunities

POSTER: Probing Tor Hidden Service with Dockers

Upcoming Conferences

Querying deep web data sources as linked data

Querying and searching the deep web

Anonymity of Tor: Myth and Reality

FuhSen: A Platform for Federated, RDF-based Hybrid Search

Towards complete coverage in focused web harvesting

Ranking Deep Web Text Collections for Scalable Information Extraction

DataXFormer: An Interactive Data Transformation Tool

DIADEM: thousands of websites to a single database

Exploration and mining of web repositories

The deep web: woven to catch the middle ground

Deep web entity monitoring

Searching the deep web using proactive phrase queries

Size estimation of non-cooperative data collections

Stratified k-means clustering over a deep web data source

OPAL: a passe-partout for web forms

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences