Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
skip to main content
10.1145/3183713.3196927acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections

Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging

Authors: Niv Dayan and Stratos IdreosAuthors Info & Claims
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
Pages 505 - 520
Published: 27 May 2018 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper, we show that all mainstream LSM-tree based key-value stores in the literature and in industry are suboptimal with respect to how they trade off among the I/O costs of updates, point lookups, range lookups, as well as the cost of storage, measured as space-amplification. The reason is that they perform expensive merge operations in order to (1) bound the number of runs that a lookup has to probe, and to (2) remove obsolete entries to reclaim space. However, most of these merge operations reduce point lookup cost, long range lookup cost, and space-amplification by a negligible amount.
    To address this problem, we expand the LSM-tree design space with Lazy Leveling, a new design that prohibits merge operations at all levels of LSM-tree but the largest. We show that Lazy Leveling improves the worst-case cost complexity of updates while maintaining the same bounds on point lookup cost, long range lookup cost, and space-amplification. To be able to navigate between Lazy Leveling and other designs, we make the LSM-tree design space fluid by introducing Fluid LSM-tree, a generalization of LSM-tree that can be parameterized to assume all existing LSM-tree designs. We show how to fluidly transition from Lazy Leveling to (1) designs that are more optimized for updates by merging less at the largest level, and (2) designs that are more optimized for small range lookups by merging more at all other levels.
    We put everything together to design Dostoevsky, a key-value store that navigates the entire Fluid LSM-tree design space based on the application workload and hardware to maximize throughput using a novel closed-form performance model. We implemented Dostoevsky on top of RocksDB, and we show that it strictly dominates state-of-the-art LSM-tree based key-value stores in terms of performance and space-amplification.

    References

    [1]
    A. Aggarwal and J. S. Vitter. The Input/Output Complexity of Sorting and Related Problems. Communications of the ACM, 31(9):1116--1127, 1988.
    [2]
    D. Agrawal, D. Ganesan, R. Sitaraman, Y. Diao, and S. Singh. Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices. Proceedings of the VLDB Endowment, 2(1):361--372, 2009.
    [3]
    M. Y. Ahmad and B. Kemme. Compaction management in distributed key-value datastores. Proceedings of the VLDB Endowment, 8(8):850--861, 2015.
    [4]
    D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A Fast Array of Wimpy Nodes. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pages 1--14, 2009.
    [5]
    Apache. Accumulo. https://accumulo.apache.org/, 2016.
    [6]
    Apache. Cassandra. http://cassandra.apache.org, 2016.
    [7]
    Apache. HBase. http://hbase.apache.org/, 2016.
    [8]
    T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. LinkBench: a Database Benchmark Based on the Facebook Social Graph. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1185--1196, 2013.
    [9]
    M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. MaSM: Efficient Online Updates in Data Warehouses. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 865--876, 2011.
    [10]
    M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. Online Updates on Data Warehouses via Judicious Use of Solid-State Storage. ACM Transactions on Database Systems (TODS), 40(1), 2015.
    [11]
    A. Badam, K. Park, V. S. Pai, and L. L. Peterson. HashCache: Cache Storage for the Next Billion. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 123--136, 2009.
    [12]
    O. Balmau, D. Didona, R. Guerraoui, W. Zwaenepoel, H. Yuan, A. Arora, K. Gupta, and P. Konka. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In USENIX Annual Technical Conference, 2017.
    [13]
    M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-Oblivious Streaming B-trees. In Proceedings of the Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 81--92, 2007.
    [14]
    B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 13(7):422--426, 1970.
    [15]
    G. S. Brodal and R. Fagerberg. Lower Bounds for External Memory Dictionaries. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 546--554, 2003.
    [16]
    N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. C. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani. TAO: Facebook's Distributed Data Store for the Social Graph. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 49--60, 2013.
    [17]
    Y. Bu, V. Borkar, J. Jia, M. J. Carey, and Condie. Pregelix: Big (ger) graph analytics on a dataflow engine. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 161--172, 2014.
    [18]
    Z. Cao, S. Chen, F. Li, M. Wang, and X. S. Wang. LogKV: Exploiting Key-Value Stores for Log Processing. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), 2013.
    [19]
    F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 205--218, 2006.
    [20]
    J. Chen, C. Douglas, M. Mutsuzaki, P. Quaid, R. Ramakrishnan, S. Rao, and R. Sears. Walnut: A Unified Cloud Object Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 743--754, 2012.
    [21]
    B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the ACM Symposium on Cloud Computing (SoCC), pages 143--154, New York, NY, USA, 2010. ACM.
    [22]
    N. Dayan, M. Athanassoulis, and S. Idreos. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 79--94, 2017.
    [23]
    N. Dayan, P. Bonnet, and S. Idreos. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 327--342, 2016.
    [24]
    B. Debnath, S. Sengupta, and J. Li. FlashStore: high throughput persistent key-value store. Proceedings of the VLDB Endowment, 3(1--2):1414--1425, 2010.
    [25]
    B. Debnath, S. Sengupta, and J. Li. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 25--36, 2011.
    [26]
    G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6):205--220, 2007.
    [27]
    S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, and M. Strum. Optimizing Space Amplification in RocksDB. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), 2017.
    [28]
    Facebook. RocksDB. https://github.com/facebook/rocksdb, 2016.
    [29]
    Facebook. MyRocks. http://myrocks.io/, 2017.
    [30]
    B. Fitzpatrick and A. Vorobey. Memcached: a distributed memory object caching system, 2011.
    [31]
    G. Golan-Gueta, E. Bortnikov, E. Hillel, and I. Keidar. Scaling Concurrent Log-Structured Data Stores. In Proceedings of the ACM European Conference on Computer Systems (EuroSys), pages 32:1---32:14, 2015.
    [32]
    Google. LevelDB. https://github.com/google/leveldb/, 2016.
    [33]
    H. V. Jagadish, P. P. S. Narayan, S. Seshadri, S. Sudarshan, and R. Kanneganti. Incremental Organization for Data Recording and Warehousing. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 16--25, 1997.
    [34]
    A. Lakshman and P. Malik. Cassandra - A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010.
    [35]
    Y. Li, B. He, J. Yang, Q. Luo, K. Yi, and R. J. Yang. Tree Indexing on Solid State Drives. Proceedings of the VLDB Endowment, 3(1--2):1195--1206, 2010.
    [36]
    H. Lim, D. G. Andersen, and M. Kaminsky. Towards Accurate and Fast Evaluation of Multi-Stage Log-structured Designs. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pages 149--166, 2016.
    [37]
    H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: A Memory-Efficient, High-Performance Key-Value Store. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pages 1--13, 2011.
    [38]
    LinkedIn. Online reference. http://www.project-voldemort.com, 2016.
    [39]
    L. Lu, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pages 133--148, 2016.
    [40]
    S. Nath and A. Kansal. FlashDB: dynamic self-tuning database for NAND flash. ACM/IEEE IPSN, 2007.
    [41]
    P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996.
    [42]
    A. Papagiannis, G. Saloustros, P. González-Férez, and A. Bilas. Tucana: Design and Implementation of a Fast and Efficient Scale-up Key-value Store. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 537--550, 2016.
    [43]
    M. Pilman, K. Bocksrocker, L. Braun, R. Marroquin, and D. Kossmann. Fast Scans on Key-Value Stores. Proceedings of the VLDB Endowment, 10(11):1526--1537, 2017.
    [44]
    J. Pitman. Probability. 1999.
    [45]
    P. Raju, R. Kadekodi, V. Chidambaram, and I. Abraham. Pebblesdb: Building key-value stores using fragmented log-structured merge trees. Proceedings of the ACM Symposium on Operating Systems Principles, 2017.
    [46]
    Redis. Online reference. http://redis.io/.
    [47]
    K. Ren, Q. Zheng, J. Arulraj, and G. Gibson. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. Proceedings of the VLDB Endowment, 10(13):2037--2048, 2017.
    [48]
    R. Sears and R. Ramakrishnan. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 217--228, 2012.
    [49]
    P. Shetty, R. P. Spillane, R. Malpani, B. Andrews, J. Seyster, and E. Zadok. Building Workload-Independent Storage with VT-trees. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pages 17--30, 2013.
    [50]
    S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz. Theory and Practice of Bloom Filters for Distributed Systems. IEEE Communications Serveys & Tutorials, 14(1):131--155, 2012.
    [51]
    R. Thonangi and J. Yang. On Log-Structured Merge for Solid-State Drives. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 683--694, 2017.
    [52]
    WiredTiger. WiredTiger. https://github.com/wiredtiger/wiredtiger, 2016.
    [53]
    X. Wu, Y. Xu, Z. Shao, and S. Jiang. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 71--82, 2015.

    Cited By

    View all
    • (2024)COLEProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650717(329-346)Online publication date: 27-Feb-2024
    • (2024)FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast StorageProceedings of the VLDB Endowment10.14778/3648160.364817717:6(1377-1390)Online publication date: 1-Feb-2024
    • (2024)Advocating for Key-Value Stores with Workload Pattern Aware Dynamic CompactionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665955(124-130)Online publication date: 8-Jul-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
    May 2018
    1874 pages
    ISBN:9781450347037
    DOI:10.1145/3183713
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2018

    Permissions

    Request permissions for this article.
    Request Permissions

    Check for updates

    Author Tags

    1. bloom filters
    2. compaction
    3. lsm-tree

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS '18
    Sponsor:

    Acceptance Rates

    SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)383
    • Downloads (Last 6 weeks)42

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)COLEProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650717(329-346)Online publication date: 27-Feb-2024
    • (2024)FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast StorageProceedings of the VLDB Endowment10.14778/3648160.364817717:6(1377-1390)Online publication date: 1-Feb-2024
    • (2024)Advocating for Key-Value Stores with Workload Pattern Aware Dynamic CompactionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665955(124-130)Online publication date: 8-Jul-2024
    • (2024)Can Modern LLMs Tune and Configure LSM-based Key-Value Stores?Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665954(116-123)Online publication date: 8-Jul-2024
    • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
    • (2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
    • (2024)GRF: A Global Range Filter for LSM-Trees with Shape EncodingProceedings of the ACM on Management of Data10.1145/36549442:3(1-27)Online publication date: 30-May-2024
    • (2024)CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated InfrastructureProceedings of the ACM on Management of Data10.1145/36549272:3(1-28)Online publication date: 30-May-2024
    • (2024)A Contract-aware and Cost-effective LSM Store for Cloud Storage with Low Latency SpikesACM Transactions on Storage10.1145/364385120:2(1-27)Online publication date: 20-Feb-2024
    • (2024)Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage EnginesProceedings of the ACM on Management of Data10.1145/36393022:1(1-28)Online publication date: 26-Mar-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media

    View Table of Contents