دورية أكاديمية

Comparing global and local likelihood score thresholds in multiclass laplacian-modified Naive Bayes protein target prediction.

التفاصيل البيبلوغرافية
العنوان: Comparing global and local likelihood score thresholds in multiclass laplacian-modified Naive Bayes protein target prediction.
المؤلفون: Drakakis G, Koutsoukas A, Brewerton SC, Bodkin MJ, Evans DA, Bender A; Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK. ab454@cam.ac.uk.
المصدر: Combinatorial chemistry & high throughput screening [Comb Chem High Throughput Screen] 2015; Vol. 18 (3), pp. 323-30.
نوع المنشور: Comparative Study; Journal Article; Research Support, Non-U.S. Gov't
اللغة: English
بيانات الدورية: Publisher: Bentham Science Publishers Country of Publication: United Arab Emirates NLM ID: 9810948 Publication Model: Print Cited Medium: Internet ISSN: 1875-5402 (Electronic) Linking ISSN: 13862073 NLM ISO Abbreviation: Comb Chem High Throughput Screen Subsets: MEDLINE
أسماء مطبوعة: Publication: Saif Zone, Sharjah, U.A.E. : Bentham Science Publishers
Original Publication: Hilversum, Netherlands ; Miami, FL : Bentham Science Publishers, c1998-
مواضيع طبية MeSH: Proteins/*chemistry , Small Molecule Libraries/*chemistry, Algorithms ; Bayes Theorem ; Humans ; Ligands ; Proteins/metabolism ; Small Molecule Libraries/pharmacology
مستخلص: The increase of publicly available bioactivity data has led to the extensive development and usage of in silico bioactivity prediction algorithms. A particularly popular approach for such analyses is the multiclass Naïve Bayes, whose output is commonly processed by applying empirically-derived likelihood score thresholds. In this work, we describe a systematic way for deriving score cut-offs on a per-protein target basis and compare their performance with global thresholds on a large scale using both 5-fold cross-validation (ChEMBL 14, 189k ligand-protein pairs over 477 protein targets) and external validation (WOMBAT, 63k pairs, 421 targets). The individual protein target cut-offs derived were compared to global cut-offs ranging from -10 to 40 in score bouts of 2.5. The results indicate that individual thresholds had equal or better performance in all comparisons with global thresholds, ranging from 95% of protein targets to 57.96%. It is shown that local thresholds behave differently for particular families of targets (CYPs, GPCRs, Kinases and TFs). Furthermore, we demonstrate the discrepancy in performance when we move away from the training dataset chemical space, using Tanimoto similarity as a metric (from 0 to 1 in steps of 0.2). Finally, the individual protein score cut-offs derived for the in silico bioactivity application used in this work are released, as well as the reproducible and transferable KNIME workflows used to carry out the analysis.
المشرفين على المادة: 0 (Ligands)
0 (Proteins)
0 (Small Molecule Libraries)
تواريخ الأحداث: Date Created: 20150310 Date Completed: 20151022 Latest Revision: 20190923
رمز التحديث: 20240628
DOI: 10.2174/1386207318666150305145012
PMID: 25747441
قاعدة البيانات: MEDLINE
الوصف
تدمد:1875-5402
DOI:10.2174/1386207318666150305145012