Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
skip to main content
10.1145/2020408.2020426acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Large-scale matrix factorization with distributed stochastic gradient descent

Published: 21 August 2011 Publication History

Abstract

We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm. We first develop a novel "stratified" SGD variant (SSGD) that applies to general loss-minimization problems in which the loss function can be expressed as a weighted sum of "stratum losses." We establish sufficient conditions for convergence of SSGD using results from stochastic approximation theory and regenerative process theory. We then specialize SSGD to obtain a new matrix-factorization algorithm, called DSGD, that can be fully distributed and run on web-scale datasets using, e.g., MapReduce. DSGD can handle a wide variety of matrix factorizations. We describe the practical techniques used to optimize performance in our DSGD implementation. Experiments suggest that DSGD converges significantly faster and has better scalability properties than alternative algorithms.

References

[1]
Apache Hadoop. https://hadoop.apache.org.
[2]
S. Asmussen. Applied Probability and Queues. Springer, 2nd edition, 2003.
[3]
J. Bennett and S. Lanning. The Netflix prize. In KDD Cup and Workshop, 2007.
[4]
C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, 2007.
[5]
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS, volume 20, pages 161--168. 2008.
[6]
R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput., 16(5):1190--1208, 1995.
[7]
Y. S. Chow and H. Teicher. Probability Theory: Independence, Interchangeability, Martingales. Springer, 2nd edition, 1988.
[8]
A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In WWW, pages 271--280, 2007.
[9]
S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, and J. McPherson. Ricardo: Integrating R and Hadoop. In SIGMOD, pages 987--998, 2010.
[10]
R. Gemulla, P. J. Haas, E. Nijkamp, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. Technical Report RJ10481, IBM Almaden Research Center, San Jose, CA, 2011. Available at www.almaden.ibm.com/cs/people/peterh/dsgdTechRep.pdf.
[11]
K. B. Hall, S. Gilpin, and G. Mann. MapReduce/Bigtable for distributed optimization. In NIPS LCCC Workshop, 2010.
[12]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999.
[13]
Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30--37, 2009.
[14]
H. J. Kushner and G. Yin. Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2nd edition, 2003.
[15]
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, 1999.
[16]
C. Liu, H.-c. Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In WWW, pages 681--690, 2010.
[17]
G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In NIPC, pages 1231--1239. 2009.
[18]
R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. In HLT, pages 456--464, 2010.
[19]
A. P. Singh and G. J. Gordon. A unified view of matrix factorization models. In ECML PKDD, pages 358--373, 2008.
[20]
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix Prize. In AAIM, pages 337--348, 2008.
[21]
M. A. Zinkevich, M. Weimer, A. J. Smola, and L. Li. Parallelized stochastic gradient descent. In NIPS, pages 2595--2603, 2010.

Cited By

View all
  • (2024)IUAutoTimeSVD++: A Hybrid Temporal Recommender System Integrating Item and User Features Using a Contractive AutoencoderInformation10.3390/info1504020415:4(204)Online publication date: 5-Apr-2024
  • (2024)Clustering Network Traffic Using Semi-Supervised LearningElectronics10.3390/electronics1314276913:14(2769)Online publication date: 14-Jul-2024
  • (2024)A Structural Topic and Sentiment-Discourse Model for Text AnalysisSSRN Electronic Journal10.2139/ssrn.4020651Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Large-scale matrix factorization with distributed stochastic gradient descent

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2011
    1446 pages
    ISBN:9781450308137
    DOI:10.1145/2020408
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed matrix factorization
    2. mapreduce
    3. recommendation system
    4. stochastic gradient descent

    Qualifiers

    • Research-article

    Conference

    KDD '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)145
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 03 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)IUAutoTimeSVD++: A Hybrid Temporal Recommender System Integrating Item and User Features Using a Contractive AutoencoderInformation10.3390/info1504020415:4(204)Online publication date: 5-Apr-2024
    • (2024)Clustering Network Traffic Using Semi-Supervised LearningElectronics10.3390/electronics1314276913:14(2769)Online publication date: 14-Jul-2024
    • (2024)A Structural Topic and Sentiment-Discourse Model for Text AnalysisSSRN Electronic Journal10.2139/ssrn.4020651Online publication date: 2024
    • (2024)A Fuzzy PID-Incorporated Stochastic Gradient Descent Algorithm for Fast and Accurate Latent Factor AnalysisIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.338973332:7(4049-4061)Online publication date: Jul-2024
    • (2024)Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data RepresentationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.330037032:2(445-459)Online publication date: Feb-2024
    • (2024)An Evolving Preference-Based Recommendation SystemIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2023.33439988:2(1118-1124)Online publication date: Apr-2024
    • (2024)Robust Asynchronous Federated Learning With Time-Weighted and Stale Model AggregationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.330478821:4(2361-2375)Online publication date: Jul-2024
    • (2024)Adaptively-Accelerated Parallel Stochastic Gradient Descent for High-Dimensional and Incomplete Data Representation LearningIEEE Transactions on Big Data10.1109/TBDATA.2023.332630410:1(92-107)Online publication date: Feb-2024
    • (2024)Parallel Adaptive Stochastic Gradient Descent Algorithms for Latent Factor Analysis of High-Dimensional and Incomplete Industrial DataIEEE Transactions on Automation Science and Engineering10.1109/TASE.2023.326760921:3(2716-2729)Online publication date: Jul-2024
    • (2024)Efficient Convergence Analysis of Stochastic Gradient Descent for Large-Scale Optimization2024 International Conference on Optimization Computing and Wireless Communication (ICOCWC)10.1109/ICOCWC60930.2024.10470675(1-5)Online publication date: 29-Jan-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media