دورية أكاديمية

Categorical linkage-data analysis.

التفاصيل البيبلوغرافية
العنوان: Categorical linkage-data analysis.
المؤلفون: Zhang LC; Department of Social Statistics and Demography, University of Southampton, Southampton, UK., Tuoto T; Department of Social Statistics and Demography, Istituto Nazionale di Statistica, Rome, Italy.
المصدر: Statistics in medicine [Stat Med] 2024 Aug 15; Vol. 43 (18), pp. 3463-3483. Date of Electronic Publication: 2024 Jun 10.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Wiley Country of Publication: England NLM ID: 8215016 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1097-0258 (Electronic) Linking ISSN: 02776715 NLM ISO Abbreviation: Stat Med Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Chichester ; New York : Wiley, c1982-
مواضيع طبية MeSH: Medical Record Linkage*/methods , Models, Statistical* , Computer Simulation*, Humans ; Data Interpretation, Statistical ; Probability
مستخلص: Analysis of integrated data often requires record linkage in order to join together the data residing in separate sources. In case linkage errors cannot be avoided, due to the lack a unique identity key that can be used to link the records unequivocally, standard statistical techniques may produce misleading inference if the linked data are treated as if they were true observations. In this paper, we propose methods for categorical data analysis based on linked data that are not prepared by the analyst, such that neither the match-key variables nor the unlinked records are available. The adjustment is based on the proportion of false links in the linked file and our approach allows the probabilities of correct linkage to vary across the records without requiring that one is able to estimate this probability for each individual record. It accommodates also the general situation where unmatched records that cannot possibly be correctly linked exist in all the sources. The proposed methods are studied by simulation and applied to real data.
(© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)
References: Zhang L‐C, Chambers RL. Analysis of Integrated Data. Boca Raton, FL: Chapman & Hall/CRC; 2019.
Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc. 1969;64:1183‐1210.
Herzog TN, Scheuren FJ, Winkler WE. Data Quality and Record Linkage Techniques. New York: Springer; 2007.
Christen P. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans Knowl Data Eng. 2012;24(9).
Harron K, Goldstein H, Dibben C. Methodological Developments in Data Linkage. New York: Wiley; 2015.
Zhang L‐C. On secondary analysis of datasets that cannot be linked without errors. In: Zhang L‐C, Chambers RL, eds. Analysis of Integrated Data. CRC: Chapman and Hall, London; 2019.
Chambers R. Regression Analysis of Probability‐Linked Data. Official Statistics Research Series. Vol 4. Wellington: Statistics New Zealand; 2009.
Lahiri P, Larsen MD. Regression analysis with linked data. J Am Stat Assoc. 2005;100:222‐230.
Scheuren F, Winkler WE. Regression analysis of data files that are computer matched – part II. Surv Methodol. 1997;23:157‐165.
Chambers RL, da Silva AD. Improved secondary analysis of linked data: a framework and an illustration. J R Stat Soc: Ser A. 2019;183:37‐59. doi:10.1111/rssa.12477.
Chambers RL, Fabrizi E, Ranalli MG, Salvati N, Wang S. Robust regression using probabilistically linked data. Wiley Interdiscip Rev: Comput Stat. 2023;15:e1596.
DeGroot MH, Goel PK. Estimation of the correlation coefficient from a broken random sample. Ann Stat. 1980;8:264‐278.
Hof M, Zwinderman A. A mixture model for the analysis of data derived from record linkage. Stat Med. 2015;34:74‐92.
Slawski M, Ben‐David E, Li P. Two‐stage approach to multivariate linear regression with sparsely mismatched data. J Mach Learn Res. 2020;21:8422‐8463.
Slawski M, Diao G, Ben‐David E. A pseudo‐likelihood approach to linear regression with partially shuffled data. J Comput Graph Stat. 2021;31:991‐1003.
Wang Z, Ben‐David E, Diao G, Slawski M. Regression with linked datasets subject to linkage error. Wiley Interdiscip Rev: Comput Stat. 2022;14:e1570.
Zhang L‐C, Tuoto T. Linkage‐data linear regression. J R Stat Soc, Ser A. 2021;184:522‐554.
Vo TH, Garès V, Zhang L‐C, et al. Cox regression with linked data. Stat Med. 2022;43:296‐314.
Chipperfield JO, Bishop GR, Campell P. Maximum likelihood estimation for contingency tables and logistic regression with incorrectly linked data. Surv Methodol. 2011;37:13‐24.
Chipperfield JO, Chambers RC. Using bootstrap to account for linkage errors when analysing probabilistically linked categorical data. J Official Stat. 2015;31:397‐414.
Scholtus S, Shlomo N, De Waal T. Correcting for linkage errors in contingency tables – a cautionary tale. J Stat Plan Inference. 2022;218:122‐137.
Doidge JC, Harron K. Reflections on modern methods: linkage error bias. Int J Epidemiol. 2019;48:2050‐2060.
Tuoto T, Moretti D, Orsi C, Baldassarre G, Di Fraia G, Bruzzone S. Le vittime in incidenti stradali: una esperienza di record linkage tra diverse fonti informative Rapporto Osservasalute 2017‐Approfondimenti. 2017 https://osservatoriosullasalute.it/osservasalute/rapporto‐osservasalute‐2017.
فهرسة مساهمة: Keywords: analysis of contingency table; heterogeneous linkage error; incomplete match space; linkage data structure; logistic regression; secondary analysis
تواريخ الأحداث: Date Created: 20240610 Date Completed: 20240716 Latest Revision: 20240716
رمز التحديث: 20240716
DOI: 10.1002/sim.10134
PMID: 38853711
قاعدة البيانات: MEDLINE
الوصف
تدمد:1097-0258
DOI:10.1002/sim.10134