Comparison of logP and logD correction models trained with public and proprietary data sets

التفاصيل البيبلوغرافية
العنوان: Comparison of logP and logD correction models trained with public and proprietary data sets
المؤلفون: Ignacio Aliagas, Alberto Gobbi, Man-Ling Lee, Benjamin D. Sellers
المصدر: Journal of Computer-Aided Molecular Design. 36:253-262
بيانات النشر: Springer Science and Business Media LLC, 2022.
سنة النشر: 2022
مصطلحات موضوعية: Machine Learning, Octanols, Drug Discovery, Water, Physical and Theoretical Chemistry, Algorithms, Software, Computer Science Applications
الوصف: In drug discovery, partition and distribution coefficients, logP and logD for octanol/water, are widely used as metrics of the lipophilicity of molecules, which in turn have a strong influence on the bioactivity and bioavailability of potential drugs. There are a variety of established methods, mostly fragment or atom-based, to calculate logP while logD prediction generally relies on calculated logP and pKa for the estimation of neutral and ionized populations at a given pH. Algorithms such as ClogP have limitations generally leading to systematic errors for chemically related molecules while pKa estimation is generally more difficult due to the interplay of electronic, inductive and conjugation effects for ionizable moieties. We propose an integrated machine learning QSAR modeling approach to predict logD by training the model with experimental data while using ClogP and pKa predicted by commercial software as model descriptors. By optimizing the loss function for the ClogD calculated by the software, we build a correction model that incorporates both descriptors from the software and available experimental logD data. Additionally, we calculate logP from the logD model using the software predicted pKa's. Here, we have trained models using publicly or commercial available logD data to show that this approach can improve on commercial software predictions of lipophilicity. When applied to other logD data sets, this approach extends the domain of applicability of logD and logP predictions over commercial software. Performance of these models favorably compare with models built with a larger set of proprietary logD data.
تدمد: 1573-4951
0920-654X
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ecf800594ce6384f55f2d77d79a45929
https://doi.org/10.1007/s10822-022-00450-9
حقوق: CLOSED
رقم الأكسشن: edsair.doi.dedup.....ecf800594ce6384f55f2d77d79a45929
قاعدة البيانات: OpenAIRE