Log-ratio analysis of microbiome data with many zeroes is library size dependent

التفاصيل البيبلوغرافية
العنوان: Log-ratio analysis of microbiome data with many zeroes is library size dependent
المؤلفون: Dennis E. te Beest, Tim W. R. Möhlmann, Cajo J. F. ter Braak, E.H. Nijhuis
المصدر: Molecular Ecology Resources, 21(6), 1866-1874
Molecular Ecology Resources 21 (2021) 6
Molecular Ecology Resources
بيانات النشر: Authorea, Inc., 2020.
سنة النشر: 2020
مصطلحات موضوعية: 0106 biological sciences, 0301 basic medicine, Multivariate statistics, Dependency (UML), multivariate statistics, microbiome, Biology, 010603 evolutionary biology, 01 natural sciences, log‐ratio analysis, 03 medical and health sciences, Biointeractions and Plant Health, Redundancy (information theory), Statistics, Genetics, Computer Simulation, Resource Article, Ecology, Evolution, Behavior and Systematics, Gene Library, Mathematics, Principal Component Analysis, Microbiota, RESOURCE ARTICLES, zero inflation, PE&RC, Molecular and Statistical Advances, Variable (computer science), 030104 developmental biology, Biometris, Principal component analysis, log-ratio analysis, Compositional data, Biotechnology, Count data, Type I and type II errors
الوصف: Microbiome composition data collected through amplicon sequencing are count data on taxa in which the total count per sample (the library size) is an artifact of the sequencing platform and as a result such data are compositional. To avoid library size dependency, one common way of analyzing multivariate compositional data is to perform a principal component analysis (PCA) on data transformed with the centered log-ratio, hereafter called a log-ratio PCA. Two aspects typical of amplicon sequencing data are the large differences in library size and the large number of zeroes. In this paper we show on real data and by simulation that, applied to data that combines these two aspects, log-ratio PCA is nevertheless heavily dependent on the library size. This leads to a reduction in power when testing against any explanatory variable in log-ratio redundancy analysis. If there is additionally a correlation between the library size and the explanatory variable, then the type 1 error becomes inflated. We explore putative solutions to this problem.
وصف الملف: application/pdf
تدمد: 1866-1874
1755-098X
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d3cece0ded823c1201f48d689114b12e
https://doi.org/10.22541/au.160673966.61075882/v1
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....d3cece0ded823c1201f48d689114b12e
قاعدة البيانات: OpenAIRE