Analysis of data dictionary formats of HIV clinical trials

التفاصيل البيبلوغرافية
العنوان: Analysis of data dictionary formats of HIV clinical trials
المؤلفون: Craig S. Mayer, Vojtech Huser, Nick Williams
المصدر: PLoS ONE
PLoS ONE, Vol 15, Iss 10, p e0240047 (2020)
بيانات النشر: Public Library of Science, 2020.
سنة النشر: 2020
مصطلحات موضوعية: RNA viruses, 020205 medical informatics, Databases, Factual, Computer science, HIV Infections, 02 engineering and technology, Pathology and Laboratory Medicine, Pediatrics, 0302 clinical medicine, Immunodeficiency Viruses, 0202 electrical engineering, electronic engineering, information engineering, Medicine and Health Sciences, 030212 general & internal medicine, Data Management, Virus Testing, Vaccines, Clinical Trials as Topic, Multidisciplinary, Data dictionary, Anti-Retroviral Agents, Information model, Medical Microbiology, Viral Pathogens, Viruses, Data analysis, Medicine, Infectious diseases, Pathogens, Research Article, Medical conditions, Computer and Information Sciences, Science, HIV prevention, Viral diseases, Data type, Microbiology, Set (abstract data type), 03 medical and health sciences, Text mining, Diagnostic Medicine, Virology, Retroviruses, Infectious disease control, Adults, Humans, Microbial Pathogens, Preventive medicine, Metadata, Information retrieval, Data element, Data collection, business.industry, Viral vaccines, Lentivirus, Organisms, HIV vaccines, Biology and Life Sciences, HIV, Missing data, Data sharing, Young Adults, Public and occupational health, Age Groups, People and Places, Population Groupings, business
الوصف: BackgroundEfforts to define research Common Data Elements try to harmonize data collection across clinical studies.ObjectiveOur goal was to analyze the quality and usability of data dictionaries of HIV studies.MethodsFor the clinical domain of HIV, we searched data sharing platforms and acquired a set of 18 HIV related studies from which we analyzed 26 328 data elements. We identified existing standards for creating a data dictionary and reviewed their use. To facilitate aggregation across studies, we defined three types of data dictionary (data element, forms, and permissible values) and created a simple information model for each type.ResultsAn average study had 427 data elements (ranging from 46 elements to 9 945 elements). In terms of data type, 48.6% of data elements were string, 47.8% were numeric, 3.0% were date and 0.6% were date-time. No study in our sample explicitly declared a data element as a categorical variable and rather considered them either strings or numeric. Only for 61% of studies were we able to obtain permissible values. The majority of studies used CSV files to share a data dictionary while 22% of the studies used a non-computable, PDF format. All studies grouped their data elements. The average number of groups or forms per study was 24 (ranging between 2 and 124 groups/forms). An accurate and well formatted data dictionary facilitates error-free secondary analysis and can help with data de-identification.ConclusionWe saw features of data dictionaries that made them difficult to use and understand. This included multiple data dictionary files or non-machine-readable documents, data elements included in data but not in the dictionary or missing data types or descriptions. Building on experience with aggregating data elements across a large set of studies, we created a set of recommendations (called CONSIDER statement) that can guide optimal data sharing of future studies.
اللغة: English
تدمد: 1932-6203
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::96820709716b38763474c4fd406ba399
http://europepmc.org/articles/PMC7535029
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....96820709716b38763474c4fd406ba399
قاعدة البيانات: OpenAIRE