Investigation of the Use of Spectral Clustering for the Analysis of Molecular Data

التفاصيل البيبلوغرافية
العنوان: Investigation of the Use of Spectral Clustering for the Analysis of Molecular Data
المؤلفون: Sonny Gan, Valerie J. Gillet, Eleanor J. Gardiner, David A. Cosgrove
المصدر: Journal of Chemical Information and Modeling. 54:3302-3319
بيانات النشر: American Chemical Society (ACS), 2014.
سنة النشر: 2014
مصطلحات موضوعية: Clustering high-dimensional data, Fuzzy clustering, Cyclooxygenase 2 Inhibitors, business.industry, General Chemical Engineering, Statistics as Topic, Correlation clustering, Single-linkage clustering, Pattern recognition, General Chemistry, Library and Information Sciences, computer.software_genre, Spectral clustering, Computer Science Applications, CURE data clustering algorithm, Drug Discovery, Cluster Analysis, Artificial intelligence, Data mining, business, Cluster analysis, computer, Algorithms, k-medians clustering, Mathematics
الوصف: Spectral clustering involves placing objects into clusters based on the eigenvectors and eigenvalues of an associated matrix. The technique was first applied to molecular data by Brewer [J. Chem. Inf. Model. 2007, 47, 1727-1733] who demonstrated its use on a very small dataset of 125 COX-2 inhibitors. We have determined suitable parameters for spectral clustering using a wide variety of molecular descriptors and several datasets of a few thousand compounds and compared the results of clustering using a nonoverlapping version of Brewer's use of Sarker and Boyer's algorithm with that of Ward's and k-means clustering. We then replaced the exact eigendecomposition method with two different approximate methods and concluded that Singular Value Decomposition is the most appropriate method for clustering larger compound collections of up to 100,000 compounds. We have also used spectral clustering with the Tversky coefficient to generate two sets of clusters linked by a common set of eigenvalues and have used this novel approach to cluster sets of fragments such as those used in fragment-based drug design.
تدمد: 1549-960X
1549-9596
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::710c10d9e262fbef6eabd05095f54368
https://doi.org/10.1021/ci500480b
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....710c10d9e262fbef6eabd05095f54368
قاعدة البيانات: OpenAIRE