BBC Russian

research-article

Predicting Drug Demand with Wikipedia Views: Evidence from Darknet Markets.

WWW '20: Proceedings of The Web Conference 2020April 2020, pp 2669–2675https://doi.org/10.1145/3366423.3380022

Rapid changes in illicit drug demand, such as the Fentanyl epidemic, are a major public health issue. Policymakers currently rely on annual surveys to monitor public consumption, which are arguably too infrequent to detect rapid shifts in drug use. We ...

research-article

Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment

SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataJune 2019, pp 229–246https://doi.org/10.1145/3299869.3319899

Data enrichment is the act of extending a local database with new attributes from external data sources. In this paper, we study a novel problem-how to progressively crawl the deep web (i.e., a hidden database) through a keyword-search API to enrich a ...

research-article

DWSpyder: a new schema extraction method for a deep web integration system

International Journal of Web Engineering and Technology (IJWET), Volume 14, Issue 22019, pp 122–150https://doi.org/10.1504/ijwet.2019.102872

The deep web is a huge part of the web that is not indexed by search engines. The deep web sources are accessible only through their associated access forms. We wish to use a web integration system to access the deep web sources and all of their ...

research-article

Deeper: A Data Enrichment System Powered by Deep Web

SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataMay 2018, pp 1801–1804https://doi.org/10.1145/3183713.3193569

Data scientists often spend more than 80% of their time on data preparation. Data enrichment, the act of extending a local database with new attributes from external data sources, is among the most time-consuming tasks. Existing data enrichment works ...

research-article

Free

Browserless Web Data Extraction: Challenges and Opportunities

WWW '18: Proceedings of the 2018 World Wide Web ConferenceApril 2018, pp 1095–1104https://doi.org/10.1145/3178876.3186008

Most modern web scrapers use an embedded browser to render web pages and to simulate user actions. Such scrapers (or wrappers) are therefore expensive to execute, in terms of time and network traffic. In contrast, it is magnitudes more resource-...

poster

POSTER: Probing Tor Hidden Service with Dockers

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2017, pp 2571–2573https://doi.org/10.1145/3133956.3138849

Tor is a commonly used anonymous network that provides the hidden services. As the number of hidden services using Tor's anonymous network has been steadily increasing every year, so does the number of services that abuse Tor's anonymity. The existing ...

research-article

Querying deep web data sources as linked data

WIMS '17: Proceedings of the 7th International Conference on Web Intelligence, Mining and SemanticsJune 2017, Article No.: 32, pp 1–7https://doi.org/10.1145/3102254.3102290

The Deep Web is constituted by dynamically generated pages, usually requested through HTML forms; it is notoriously difficult to query and to search, as its pages are obviously non-indexable. Recently, Deep Web data have been made accessible through ...

invited-talk

Querying and searching the deep web

Andrea Calí

WIMS '17: Proceedings of the 7th International Conference on Web Intelligence, Mining and SemanticsJune 2017, Article No.: 3, pp 1https://doi.org/10.1145/3102254.3102257

The term Deep Web (sometimes also called Hidden Web) [2, 5, 8] refers to the data content that is accessible through Web pages, typically via HTML forms, but is not available on static pages for indexing by search engines. Deep Web data reside in ...

research-article

Anonymity of Tor: Myth and Reality

CEE-SECR '16: Proceedings of the 12th Central and Eastern European Software Engineering Conference in RussiaOctober 2016, Article No.: 10, pp 1–5https://doi.org/10.1145/3022211.3022221

Privacy enhancing technologies (PETs) are ubiquitous nowadays. They are beneficial for a wide range of users. However, PETs are not always used for legal activity. The present paper is focused on Tor users deanonimization1 using out-of-the box ...

demonstration

FuhSen: A Platform for Federated, RDF-based Hybrid Search

WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebApril 2016, pp 171–174https://doi.org/10.1145/2872518.2890535

The increasing amount of structured and semi-structured information available on the Web and in distributed information systems, as well as the Web's diversification into different segments such as the Social Web, the Deep Web, or the Dark Web, requires ...

research-article

Towards complete coverage in focused web harvesting

iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & ServicesDecember 2015, Article No.: 65, pp 1–9https://doi.org/10.1145/2837185.2837208

With the goal of harvesting all information about a given entity, in this paper, we try to harvest all matching documents for a given query submitted on a search engine. The objective is to retrieve all information about for instance "Michael Jackson", "...

article

A neural network based intrusion detection and user identification system for Tor networks: performance evaluation for different number of hidden units using Friedman test

Journal of Mobile Multimedia (JMM), Volume 11, Issue 3-4November 2015, pp 251–262

Due to the amount of anonymity afforded to users of the Tor infrastructure, Tor has become a useful tool for malicious users. With Tor, the users are able to compromise the non-repudiation principle of computer security. Also, the potentially hackers ...

research-article

Ranking Deep Web Text Collections for Scalable Information Extraction

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementOctober 2015, pp 153–162https://doi.org/10.1145/2806416.2806581

Information extraction (IE) systems discover structured information from natural language text, to enable much richer querying and data mining than possible directly over the unstructured text. Unfortunately, IE is generally a computationally expensive ...

research-article

DataXFormer: An Interactive Data Transformation Tool

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataMay 2015, pp 883–888https://doi.org/10.1145/2723372.2735366

While syntactic transformations require the application of a formula on the input values, such as unit conversion or date format conversions, semantic transformations, such as "zip code to city", require a look-up in some reference data. We recently ...

research-article

DIADEM: thousands of websites to a single database

Proceedings of the VLDB Endowment (PVLDB), Volume 7, Issue 14pp 1845–1856https://doi.org/10.14778/2733085.2733091

The web is overflowing with implicitly structured data, spread over hundreds of thousands of sites, hidden deep behind search forms, or siloed in marketplaces, only accessible as HTML. Automatic extraction of structured data at the scale of thousands of ...

tutorial

Exploration and mining of web repositories

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningFebruary 2014, pp 675–676https://doi.org/10.1145/2556195.2556197

With the proliferation of very large data repositories hidden behind web interfaces, e.g., keyword search, form-like search and hierarchical/graph-based browsing interfaces for Amazon.com, eBay.com, etc., efficient ways of searching, exploring and/or ...

research-article

The deep web: woven to catch the middle ground

Wensheng Wu

Web-KR '13: Proceedings of the 4th international workshop on Web-scale knowledge representation retrieval and reasoningNovember 2013, pp 5–8https://doi.org/10.1145/2512405.2512408

The massive and diverse data sources on the Deep Web presents a serious data integration challenge. Existing virtual integration approaches suffer from slow query response, while surfacing approaches demand hefty storage space and incur huge costs in ...

Article

Automatic classification of web databases using domain-dictionaries

MLDM'13: Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern RecognitionJuly 2013, pp 340–351https://doi.org/10.1007/978-3-642-39712-7_26

The identification, classification and integration of databases on the Web (also called web databases) as information sources is still a great challenge due to their constantly growing and diversification. The classification of such web databases ...

Article

Current challenges in web crawling

Denis Shestakov

ICWE'13: Proceedings of the 13th international conference on Web EngineeringJuly 2013, pp 518–521https://doi.org/10.1007/978-3-642-39200-9_49

Web crawling, a process of collecting web pages in an automated manner, is the primary and ubiquitous operation used by a large number of web systems and agents starting from a simple program for website backup to a major web search engine. Due to an ...

abstract

Deep web entity monitoring

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebMay 2013, pp 377–382https://doi.org/10.1145/2487788.2487946

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Caption

Predicting Drug Demand with Wikipedia Views: Evidence from Darknet Markets.

Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment

DWSpyder: a new schema extraction method for a deep web integration system

Deeper: A Data Enrichment System Powered by Deep Web

Browserless Web Data Extraction: Challenges and Opportunities

Upcoming Conferences

POSTER: Probing Tor Hidden Service with Dockers

Querying deep web data sources as linked data

Querying and searching the deep web

Anonymity of Tor: Myth and Reality

FuhSen: A Platform for Federated, RDF-based Hybrid Search

Towards complete coverage in focused web harvesting

A neural network based intrusion detection and user identification system for Tor networks: performance evaluation for different number of hidden units using Friedman test

Ranking Deep Web Text Collections for Scalable Information Extraction

DataXFormer: An Interactive Data Transformation Tool

DIADEM: thousands of websites to a single database

Exploration and mining of web repositories

The deep web: woven to catch the middle ground

Automatic classification of web databases using domain-dictionaries

Current challenges in web crawling

Deep web entity monitoring

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences