دورية أكاديمية

Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.

التفاصيل البيبلوغرافية
العنوان: Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.
المؤلفون: Kidziński Ł; Department of Bioengineering, Stanford University, Stanford, CA 94305, USA., Hui FKC; Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, ACT 2601, Australia., Warton DI; School of Mathematics and Statistics and Evolution & Ecology Research Centre, The University of New South Wales, Sydney, NSW 2052, Australia., Hastie T; Department of Statistics and Biomedical Data Science, Stanford University Stanford, CA 94305, USA.
المصدر: Journal of machine learning research : JMLR [J Mach Learn Res] 2022 Nov; Vol. 23.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: MIT Press Country of Publication: United States NLM ID: 101262635 Publication Model: Print Cited Medium: Print ISSN: 1532-4435 (Print) Linking ISSN: 15324435 NLM ISO Abbreviation: J Mach Learn Res Subsets: PubMed not MEDLINE
أسماء مطبوعة: Original Publication: Cambridge, MA : MIT Press, 2001-
مستخلص: Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.
References: Biometrics. 2018 Dec;74(4):1311-1319. (PMID: 29750847)
Ecol Lett. 2017 May;20(5):561-576. (PMID: 28317296)
Trends Ecol Evol. 2015 Dec;30(12):766-779. (PMID: 26519235)
PLoS One. 2019 May 1;14(5):e0216129. (PMID: 31042745)
J Mach Learn Res. 2010 Mar 1;11:2287-2322. (PMID: 21552465)
J Anim Ecol. 2011 Jan;80(1):119-27. (PMID: 20831728)
Nat Protoc. 2012 Feb 16;7(3):500-7. (PMID: 22343431)
Methods Ecol Evol. 2020 Mar;11(3):442-447. (PMID: 32194928)
J Stat Softw. 2017;76:. (PMID: 36568334)
Theor Appl Genet. 2015 Jan;128(1):55-72. (PMID: 25326722)
Biostatistics. 2009 Jul;10(3):515-34. (PMID: 19377034)
Trends Ecol Evol. 2016 Oct;31(10):737-738. (PMID: 27515225)
معلومات مُعتمدة: R01 EB001988 United States EB NIBIB NIH HHS; R01 GM134483 United States GM NIGMS NIH HHS; U54 EB020405 United States EB NIBIB NIH HHS
فهرسة مساهمة: Keywords: Generalized Linear Mixed-effect Models; Generalized Linear Models; Nuclear Norm; Penalized Quasi-Likelihood
تواريخ الأحداث: Date Created: 20230427 Latest Revision: 20240618
رمز التحديث: 20240618
مُعرف محوري في PubMed: PMC10129058
PMID: 37102181
قاعدة البيانات: MEDLINE