دورية أكاديمية

Screening methods for linear errors-in-variables models in high dimensions.

التفاصيل البيبلوغرافية
العنوان: Screening methods for linear errors-in-variables models in high dimensions.
المؤلفون: Nghiem LH; Research School of Finance, Actuarial Studies and Statistics, Australian National University, Canberra, ACT, Australia.; School of Mathematics and Statistics, The University of Sydney, Sydney, New South Wales, Australia., Hui FKC; Research School of Finance, Actuarial Studies and Statistics, Australian National University, Canberra, ACT, Australia., Müller S; Department of Mathematics and Statistics, Macquarie University, Sydney, New South Wales, Australia., Welsh AH; Research School of Finance, Actuarial Studies and Statistics, Australian National University, Canberra, ACT, Australia.
المصدر: Biometrics [Biometrics] 2023 Jun; Vol. 79 (2), pp. 926-939. Date of Electronic Publication: 2022 Mar 25.
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't
اللغة: English
بيانات الدورية: Publisher: Biometric Society Country of Publication: United States NLM ID: 0370625 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1541-0420 (Electronic) Linking ISSN: 0006341X NLM ISO Abbreviation: Biometrics Subsets: MEDLINE
أسماء مطبوعة: Publication: Alexandria Va : Biometric Society
Original Publication: Washington.
مواضيع طبية MeSH: Computer Simulation*, Female ; Humans ; Microarray Analysis ; Sample Size
مستخلص: Microarray studies, in order to identify genes associated with an outcome of interest, usually produce noisy measurements for a large number of gene expression features from a small number of subjects. One common approach to analyzing such high-dimensional data is to use linear errors-in-variables (EIV) models; however, current methods for fitting such models are computationally expensive. In this paper, we present two efficient screening procedures, namely, corrected penalized marginal screening (PMSc) and corrected sure independence screening (SISc), to reduce the number of variables for final model building. Both screening procedures are based on fitting corrected marginal regression models relating the outcome to each contaminated covariate separately, which can be computed efficiently even with a large number of features. Under mild conditions, we show that these procedures achieve screening consistency and reduce the number of features substantially, even when the number of covariates grows exponentially with sample size. In addition, if the true covariates are weakly correlated, we show that PMSc can achieve full variable selection consistency. Through a simulation study and an analysis of gene expression data for bone mineral density of Norwegian women, we demonstrate that the two new screening procedures make estimation of linear EIV models computationally scalable in high-dimensional settings, and improve finite sample estimation and selection performance compared with estimators that do not employ a screening stage.
(© 2022 The Authors. Biometrics published by Wiley Periodicals LLC on behalf of International Biometric Society.)
References: Barut, E., Fan, J. and Verhasselt, A. (2016) Conditional sure independence screening. Journal of the American Statistical Association, 111, 1266-1277.
Belloni, A., Rosenbaum, M. and Tsybakov, A.B. (2017) Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 939-956.
Brown, B., Weaver, T. and Wolfson, J. (2019) Meboost: variable selection in the presence of measurement error. Statistics in Medicine, 38, 2705-2718.
Byrd, M. and McGee, M. (2019) A simple correction procedure for high-dimensional general linear models with measurement error. arXiv preprint arXiv:1912.11740.
Carroll, R.J., Ruppert, D., Stefanski, L.A. and Crainiceanu, C.M. (2006) Measurement Error in Nonlinear Models: A Modern Perspective. CRC Press.
Cui, H., Li, R. and Zhong, W. (2015) Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630-641. New York.
Datta, A. and Zou, H. (2017) Cocolasso for high-dimensional error-in-variables regression. Annals of Statistics, 45, 2400-2426.
Datta, A. and Zou, H. (2020) A note on cross-validation for lasso under measurement errors. Technometrics, 62, 549-556.
Do, C.B., Tung, J.Y., Dorfman, E., Kiefer, A.K., Drabant, E.M., Francke, U., et al. (2011) Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease. PLoS Genetics, 7, e1002141.
Fan, J. and Lv, J. (2008) Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 849-911.
Fan, J. and Peng, H. (2004) Nonconcave penalized likelihood with a diverging number of parameters. Annals of Statistics, 32, 928-961.
Fan, J. and Song, R. (2010) Sure independence screening in generalized linear models with np-dimensionality. Annals of Statistics, 38, 3567-3604.
Frank, L.E. and Friedman, J.H. (1993) A statistical view of some chemometrics regression tools. Technometrics, 35, 109-135.
Hein, A.-M.K., Richardson, S., Causton, H.C., Ambler, G.K. and Green, P.J. (2005) BGX: a fully Bayesian integrated approach to the analysis of Affymetrix Genechip data. Biostatistics, 6, 349-373.
Huang, J., Horowitz, J.L. and Ma, S. (2008) Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Annals of Statistics, 36, 587-613.
Hui, F.K., Warton, D.I. and Foster, S.D. (2015) Tuning parameter selection for the adaptive lasso using ERIC. Journal of the American Statistical Association, 110, 262-269.
Ida, Y., Fujiwara, Y. and Kashima, H. (2019) Fast sparse group lasso. In Wallach, H., Larochelle, H., Beygelzimer, A., d' Alché-Buc, F., Fox, E. and Garnett, R. (Eds.) Advances in Neural Information Processing Systems, 32, pp. 1702-1710. https://proceedings.neurips.cc/paper/2019/hash/d240e3d38a8882ecad8633c8f9c78c9b-Abstract.html.
Kaul, A., Koul, H.L., Chawla, A. and Lahiri, S.N. (2016) Two stage non-penalized corrected least squares for high dimensional linear models with measurement error or missing covariates. arXiv preprint arXiv:1605.03154.
Li, G., Peng, H., Zhang, J. and Zhu, L. (2012) Robust rank correlation based screening. Annals of Statistics, 40, 1846-1877.
Li, R., Zhong, W. and Zhu, L. (2012) Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129-1139.
Li, X., Tang, N., Xie, J. and Yan, X. (2020) A nonparametric feature screening method for ultrahigh-dimensional missing response. Computational Statistics & Data Analysis, 142, 106828.
Loh, P.-L. and Wainwright, M.J. (2012) High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity. Annals of Statistics, 40, 1637-1664.
Nghiem, L. and Potgieter, C. (2019) Simulation-selection-extrapolation: estimation in high-dimensional errors-in-variables models. Biometrics, 75, 1133-1144.
Piironen, J. and Vehtari, A. (2017) Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11, 5018-5051.
Polson, N.G., Scott, J.G. and Windle, J. (2014) The Bayesian bridge. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 713-733.
Reppe, S., Refvem, H., Gautvik, V.T., Olstad, O.K., Høvring, P.I., Reinholt, F.P., et al. (2010) Eight genes are highly associated with BMD variation in postmenopausal Caucasian women. Bone, 46, 604-612.
Rocke, D.M. and Durbin, B. (2001) A model for measurement error for gene expression arrays. Journal of Computational Biology, 8, 557-569.
Romeo, G. and Thoresen, M. (2019) Model selection in high-dimensional noisy data: a simulation study. Journal of Statistical Computation and Simulation, 89, 2031-2050.
Rosenbaum, M. and Tsybakov, A.B. (2010) Sparse recovery under matrix uncertainty. Annals of Statistics, 38, 2620-2651.
Rosenbaum, M. and Tsybakov, A.B. (2013) Improved matrix uncertainty selector. In From Probability to Statistics and Back: High-Dimensional Models and Processes-A Festschrift in Honor of Jon A. Wellner. Institute of Mathematical Statistics, pp. 276-290.
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013) A sparse-group lasso. Journal of Computational and Graphical Statistics, 22, 231-245.
Sørensen, Ø., Frigessi, A. and Thoresen, M. (2015) Measurement error in Lasso: impact and likelihood bias correction. Statistica Sinica, 25, 809-829.
Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58, 267-288.
Wen, C., Pan, W., Huang, M. and Wang, X. (2018) Sure independence screening adjusted for confounding covariates with ultrahigh dimensional data. Statistica Sinica, 28, 293-317.
Xu, C. and Chen, J. (2014) The sparse MLE for ultrahigh-dimensional feature screening. Journal of the American Statistical Association, 109, 1257-1269.
Zakharkin, S.O., Kim, K., Mehta, T., Chen, L., Barnes, S., Scheirer, K.E., et al. (2005) Sources of variation in Affymetrix microarray experiments. BMC Bioinformatics, 6, 1-11.
Zheng, Z., Li, Y., Yu, C. and Li, G. (2018) Balanced estimation for high-dimensional measurement error models. Computational Statistics & Data Analysis, 126, 78-91.
Zhou, T., Thung, K.-H., Liu, M. and Shen, D. (2018) Brain-wide genome-wide association study for Alzheimer's disease via joint projection learning and sparse regression model. IEEE Transactions on Biomedical Engineering, 66, 165-175.
Zhu, L.-P., Li, L., Li, R. and Zhu, L.-X. (2011) Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106, 1464-1475.
فهرسة مساهمة: Keywords: dimension reduction; forward regression; measurement error; penalized regression; regularization; sure independence screening
تواريخ الأحداث: Date Created: 20220222 Date Completed: 20230621 Latest Revision: 20230621
رمز التحديث: 20230622
DOI: 10.1111/biom.13628
PMID: 35191015
قاعدة البيانات: MEDLINE