Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
skip to main content
10.1145/2723372.2735366acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

DataXFormer: An Interactive Data Transformation Tool

Published: 27 May 2015 Publication History
  • Get Citation Alerts
  • Abstract

    While syntactic transformations require the application of a formula on the input values, such as unit conversion or date format conversions, semantic transformations, such as "zip code to city", require a look-up in some reference data. We recently presented DataXFormer, a system that leverages Web tables, Web forms, and expert sourcing to cover a wide range of transformations. In this demonstration, we present the user-interaction with DataXFormer and show scenarios on how it can be used to transform data and explore the effectiveness and efficiency of several approaches for transformation discovery, leveraging about 112 million tables and online sources.

    References

    [1]
    Z. Abedjan, J. Morcos, M. Gubanov, I. Ilyas, M. Stonebraker, P. Papotti, and M. Ouzzani. Dataxformer: Leveraging the web for semantic data transformations. In CIDR, 2015.
    [2]
    B. Aditya, G. Bhalotia, S. Chakrabarti, A. Hulgeri, C. Nakhe, P. Parag, and S. Sudarshan. Banks: Browsing and keyword searching in relational databases. In VLDB, pages 1083--1086, 2002.
    [3]
    S. Agrawal, S. Chaudhuri, and G. Das. Dbxplorer: A system for keyword-based search over relational databases. In ICDE, pages 5--16, 2002.
    [4]
    L. Barbosa and J. Freire. An adaptive crawler for locating hidden-web entry points. In WWW, pages 441--450, New York, NY, USA, 2007.
    [5]
    A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics, pages 20--28, 1979.
    [6]
    V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, pages 670--681, 2002.
    [7]
    S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactive visual specification of data transformation scripts. In CHI, pages 3363--3372, New York, NY, USA, 2011.
    [8]
    J. Madhavan, D. Ko, L. Kot, V. Ganapathy, A. Rasmussen, and A. Halevy. Google's deep web crawl. PVLDB, 1(2):1241--1252, Aug. 2008.
    [9]
    M. Stonebraker, D. Bruckner, I. F. Ilyas, G. Beskales, M. Cherniack, S. B. Zdonik, A. Pagan, and S. Xu. Data curation at scale: The data tamer system. In CIDR, 2013.
    [10]
    M. Yakout, K. Ganjam, K. Chakrabarti, and S. Chaudhuri. Infogather: Entity augmentation and attribute discovery by holistic matching with web tables. In SIGMOD, pages 97--108, New York, NY, USA, 2012.

    Cited By

    View all
    • (2023)Visualizing the Scripts of Data Wrangling With SomnusIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.314497529:6(2950-2964)Online publication date: 1-Jun-2023
    • (2022)Revealing the Semantics of Data Wrangling Scripts With COMANTICSIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209470(1-11)Online publication date: 2022
    • (2022)The Fourth Paradigm in Geographical SciencesLivelihood Enhancement Through Agriculture, Tourism and Health10.1007/978-981-16-7310-8_25(495-507)Online publication date: 18-Jan-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
    May 2015
    2110 pages
    ISBN:9781450327589
    DOI:10.1145/2723372
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data enrichment
    2. data integration
    3. data transformation
    4. deep web
    5. web forms
    6. web tables
    7. wrapper

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS'15
    Sponsor:
    SIGMOD/PODS'15: International Conference on Management of Data
    May 31 - June 4, 2015
    Victoria, Melbourne, Australia

    Acceptance Rates

    SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)2

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Visualizing the Scripts of Data Wrangling With SomnusIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.314497529:6(2950-2964)Online publication date: 1-Jun-2023
    • (2022)Revealing the Semantics of Data Wrangling Scripts With COMANTICSIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209470(1-11)Online publication date: 2022
    • (2022)The Fourth Paradigm in Geographical SciencesLivelihood Enhancement Through Agriculture, Tourism and Health10.1007/978-981-16-7310-8_25(495-507)Online publication date: 18-Jan-2022
    • (2021)KTabulator: Interactive Ad hoc Table Creation using Knowledge GraphsProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445227(1-14)Online publication date: 6-May-2021
    • (2019)VADA: an architecture for end user informed data preparationJournal of Big Data10.1186/s40537-019-0237-96:1Online publication date: 21-Aug-2019
    • (2018)Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-source CodeProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3196888(35-50)Online publication date: 27-May-2018
    • (2017)Stitching web tables for improving matching qualityProceedings of the VLDB Endowment10.14778/3137628.313765710:11(1502-1513)Online publication date: 1-Aug-2017
    • (2017)Visual support for rastering of unequally spaced time seriesProceedings of the 10th International Symposium on Visual Information Communication and Interaction10.1145/3105971.3105984(53-57)Online publication date: 14-Aug-2017
    • (2017)The VADA Architecture for Cost-Effective Data WranglingProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3058730(1599-1602)Online publication date: 9-May-2017
    • (2017)Exploratory Ad-Hoc Analytics for Big DataHandbook of Big Data Technologies10.1007/978-3-319-49340-4_11(365-407)Online publication date: 26-Feb-2017
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media