Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

research-article

Large-scale matrix factorization with distributed stochastic gradient descent

Authors:

Rainer Gemulla,

Yannis SismanisAuthors Info & Claims

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 69 - 77

https://doi.org/10.1145/2020408.2020426

Published: 21 August 2011 Publication History

Abstract

We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm. We first develop a novel "stratified" SGD variant (SSGD) that applies to general loss-minimization problems in which the loss function can be expressed as a weighted sum of "stratum losses." We establish sufficient conditions for convergence of SSGD using results from stochastic approximation theory and regenerative process theory. We then specialize SSGD to obtain a new matrix-factorization algorithm, called DSGD, that can be fully distributed and run on web-scale datasets using, e.g., MapReduce. DSGD can handle a wide variety of matrix factorizations. We describe the practical techniques used to optimize performance in our DSGD implementation. Experiments suggest that DSGD converges significantly faster and has better scalability properties than alternative algorithms.

References

[1]

Apache Hadoop. https://hadoop.apache.org.

[2]

S. Asmussen. Applied Probability and Queues. Springer, 2nd edition, 2003.

[3]

J. Bennett and S. Lanning. The Netflix prize. In KDD Cup and Workshop, 2007.

[4]

C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, 2007.

Digital Library

[5]

L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS, volume 20, pages 161--168. 2008.

Digital Library

[6]

R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput., 16(5):1190--1208, 1995.

Digital Library

[7]

Y. S. Chow and H. Teicher. Probability Theory: Independence, Interchangeability, Martingales. Springer, 2nd edition, 1988.

[8]

A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In WWW, pages 271--280, 2007.

Digital Library

[9]

S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, and J. McPherson. Ricardo: Integrating R and Hadoop. In SIGMOD, pages 987--998, 2010.

Digital Library

[10]

R. Gemulla, P. J. Haas, E. Nijkamp, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. Technical Report RJ10481, IBM Almaden Research Center, San Jose, CA, 2011. Available at www.almaden.ibm.com/cs/people/peterh/dsgdTechRep.pdf.

Digital Library

[11]

K. B. Hall, S. Gilpin, and G. Mann. MapReduce/Bigtable for distributed optimization. In NIPS LCCC Workshop, 2010.

[12]

T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999.

Digital Library

[13]

Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30--37, 2009.

Digital Library

[14]

H. J. Kushner and G. Yin. Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2nd edition, 2003.

[15]

D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, 1999.

[16]

C. Liu, H.-c. Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In WWW, pages 681--690, 2010.

Digital Library

[17]

G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In NIPC, pages 1231--1239. 2009.

[18]

R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. In HLT, pages 456--464, 2010.

Digital Library

[19]

A. P. Singh and G. J. Gordon. A unified view of matrix factorization models. In ECML PKDD, pages 358--373, 2008.

Digital Library

[20]

Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix Prize. In AAIM, pages 337--348, 2008.

Digital Library

[21]

M. A. Zinkevich, M. Weimer, A. J. Smola, and L. Li. Parallelized stochastic gradient descent. In NIPS, pages 2595--2603, 2010.

Digital Library

Cited By

Azri AHaddi AAllali H(2024)IUAutoTimeSVD++: A Hybrid Temporal Recommender System Integrating Item and User Features Using a Contractive AutoencoderInformation10.3390/info1504020415:4(204)Online publication date: 5-Apr-2024
https://doi.org/10.3390/info15040204
Krajewska ANiewiadomska-Szynkiewicz E(2024)Clustering Network Traffic Using Semi-Supervised LearningElectronics10.3390/electronics1314276913:14(2769)Online publication date: 14-Jul-2024
https://doi.org/10.3390/electronics13142769
Chen LMankad S(2024)A Structural Topic and Sentiment-Discourse Model for Text AnalysisSSRN Electronic Journal10.2139/ssrn.4020651Online publication date: 2024
https://doi.org/10.2139/ssrn.4020651
Show More Cited By

Index Terms

Large-scale matrix factorization with distributed stochastic gradient descent
1. Mathematics of computing
  1. Mathematical software

Recommendations

Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion

We provide parallel algorithms for large-scale matrix completion on problems with millions of rows, millions of columns, and billions of revealed entries. We focus on in-memory algorithms that run either in a shared-memory environment on a powerful ...
Stochastic gradient descent possibilistic clustering
SETN 2020: 11th Hellenic Conference on Artificial Intelligence

Although online versions of several well known clustering algorithms have been proposed, in order to deal effectively with the big data issue, as well as with the case where the data are available in a streaming fashion, very few of them follow the ...
Asynchronous parallel stochastic gradient descent: a numeric core for scalable distributed machine learning algorithms
MLHPC '15: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in terms of ...

Comments

Information & Contributors

Information

Published In

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2011

1446 pages

ISBN:9781450308137

DOI:10.1145/2020408

General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '11

Sponsor:

KDD '11: The 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 21 - 24, 2011

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

406
Total Citations
View Citations
4,098
Total Downloads

Downloads (Last 12 months)145
Downloads (Last 6 weeks)12

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Azri AHaddi AAllali H(2024)IUAutoTimeSVD++: A Hybrid Temporal Recommender System Integrating Item and User Features Using a Contractive AutoencoderInformation10.3390/info1504020415:4(204)Online publication date: 5-Apr-2024
https://doi.org/10.3390/info15040204
Krajewska ANiewiadomska-Szynkiewicz E(2024)Clustering Network Traffic Using Semi-Supervised LearningElectronics10.3390/electronics1314276913:14(2769)Online publication date: 14-Jul-2024
https://doi.org/10.3390/electronics13142769
Chen LMankad S(2024)A Structural Topic and Sentiment-Discourse Model for Text AnalysisSSRN Electronic Journal10.2139/ssrn.4020651Online publication date: 2024
https://doi.org/10.2139/ssrn.4020651
Yuan YLi JLuo X(2024)A Fuzzy PID-Incorporated Stochastic Gradient Descent Algorithm for Fast and Accurate Latent Factor AnalysisIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.338973332:7(4049-4061)Online publication date: Jul-2024
https://doi.org/10.1109/TFUZZ.2024.3389733
Qin WLuo X(2024)Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data RepresentationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.330037032:2(445-459)Online publication date: Feb-2024
https://doi.org/10.1109/TFUZZ.2023.3300370
Chen YLee W(2024)An Evolving Preference-Based Recommendation SystemIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2023.33439988:2(1118-1124)Online publication date: Apr-2024
https://doi.org/10.1109/TETCI.2023.3343998
Miao YLiu ZLi XLi MLi HChoo KDeng R(2024)Robust Asynchronous Federated Learning With Time-Weighted and Stale Model AggregationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.330478821:4(2361-2375)Online publication date: Jul-2024
https://doi.org/10.1109/TDSC.2023.3304788
Qin WLuo XZhou M(2024)Adaptively-Accelerated Parallel Stochastic Gradient Descent for High-Dimensional and Incomplete Data Representation LearningIEEE Transactions on Big Data10.1109/TBDATA.2023.332630410:1(92-107)Online publication date: Feb-2024
https://doi.org/10.1109/TBDATA.2023.3326304
Qin WLuo XLi SZhou M(2024)Parallel Adaptive Stochastic Gradient Descent Algorithms for Latent Factor Analysis of High-Dimensional and Incomplete Industrial DataIEEE Transactions on Automation Science and Engineering10.1109/TASE.2023.326760921:3(2716-2729)Online publication date: Jul-2024
https://doi.org/10.1109/TASE.2023.3267609
Saha AYadav RDwivedi R(2024)Efficient Convergence Analysis of Stochastic Gradient Descent for Large-Scale Optimization2024 International Conference on Optimization Computing and Wireless Communication (ICOCWC)10.1109/ICOCWC60930.2024.10470675(1-5)Online publication date: 29-Jan-2024
https://doi.org/10.1109/ICOCWC60930.2024.10470675
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents