A supervised clustering MCMC methodology for large categorical feature spaces

التفاصيل البيبلوغرافية
العنوان: A supervised clustering MCMC methodology for large categorical feature spaces
المؤلفون: Simón Ramírez, Adolfo J. Quiroz, Alvaro Riascos
المصدر: Statistical methods in medical research. 30(7)
سنة النشر: 2021
مصطلحات موضوعية: Statistics and Probability, Epidemiology, Computer science, Feature vector, Machine learning, computer.software_genre, Machine Learning, 03 medical and health sciences, Health Information Management, 0502 economics and business, Cluster Analysis, 050207 economics, Cluster analysis, Categorical variable, Structure (mathematical logic), business.industry, 030503 health policy & services, 05 social sciences, Supervised learning, Feature (computer vision), Unsupervised learning, Artificial intelligence, 0305 other medical science, business, computer, Algorithms, Curse of dimensionality
الوصف: There is a well-established tradition within the statistics literature that explores different techniques for reducing the dimensionality of large feature spaces. The problem is central to machine learning and it has been largely explored under the unsupervised learning paradigm. We introduce a supervised clustering methodology that capitalizes on a Metropolis Hastings algorithm to optimize the partition structure of a large categorical feature space tailored towards minimizing the test error of a learning algorithm. This is a general methodology that can be applied to any supervised learning problem with a large categorical feature space. We show the benefits of the algorithm by applying this methodology to the problem of risk adjustment in competitive health insurance markets. We use a large claims data set that records ICD-10 codes, a large categorical feature space. We aim at improving risk adjustment by clustering diagnostic codes into risk groups suitable for health expenditure prediction. We test the performance of our methodology against common alternatives using panel data from a representative sample of twenty three million citizens in Colombian Healthcare System. Our results outperform common alternatives and suggest that it has potential to improve risk adjustment.
تدمد: 1477-0334
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8645d79c3ddf9667916417d5c7d2366d
https://pubmed.ncbi.nlm.nih.gov/34074165
حقوق: CLOSED
رقم الأكسشن: edsair.doi.dedup.....8645d79c3ddf9667916417d5c7d2366d
قاعدة البيانات: OpenAIRE