A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models
العنوان: | A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models |
---|---|
المؤلفون: | Iliyan Mihaylov, Maciej M. Kańduła, Milko Krachunov, Dimitar Vassilev |
المصدر: | Biology Direct Biology Direct, Vol 14, Iss 1, Pp 1-17 (2019) |
بيانات النشر: | BioMed Central, 2019. |
سنة النشر: | 2019 |
مصطلحات موضوعية: | Male, DNA Copy Number Variations, Immunology, Breast Neoplasms, Biology, NoSQL, computer.software_genre, Machine learning, General Biochemistry, Genetics and Molecular Biology, Semantic network, 03 medical and health sciences, Neuroblastoma, 0302 clinical medicine, Breast cancer, Feature (machine learning), Humans, Semantic integration, lcsh:QH301-705.5, Ecology, Evolution, Behavior and Systematics, 030304 developmental biology, 0303 health sciences, Semantic data integration, Models, Genetic, business.industry, Genome, Human, Applied Mathematics, Research, Computational Biology, Survival Analysis, Survival time prediction, Gene Expression Regulation, Neoplastic, lcsh:Biology (General), 030220 oncology & carcinogenesis, Modeling and Simulation, Domain knowledge, Female, Artificial intelligence, General Agricultural and Biological Sciences, Raw data, business, computer, Predictive modelling, Data integration |
الوصف: | Background Recently high-throughput technologies have been massively used alongside clinical tests to study various types of cancer. Data generated in such large-scale studies are heterogeneous, of different types and formats. With lack of effective integration strategies novel models are necessary for efficient and operative data integration, where both clinical and molecular information can be effectively joined for storage, access and ease of use. Such models, combined with machine learning methods for accurate prediction of survival time in cancer studies, can yield novel insights into disease development and lead to precise personalized therapies. Results We developed an approach for intelligent data integration of two cancer datasets (breast cancer and neuroblastoma) − provided in the CAMDA 2018 ‘Cancer Data Integration Challenge’, and compared models for prediction of survival time. We developed a novel semantic network-based data integration framework that utilizes NoSQL databases, where we combined clinical and expression profile data, using both raw data records and external knowledge sources. Utilizing the integrated data we introduced Tumor Integrated Clinical Feature (TICF) − a new feature for accurate prediction of patient survival time. Finally, we applied and validated several machine learning models for survival time prediction. Conclusion We developed a framework for semantic integration of clinical and omics data that can borrow information across multiple cancer studies. By linking data with external domain knowledge sources our approach facilitates enrichment of the studied data by discovery of internal relations. The proposed and validated machine learning models for survival time prediction yielded accurate results. Reviewers This article was reviewed by Eran Elhaik, Wenzhong Xiao and Carlos Loucera. |
اللغة: | English |
تدمد: | 1745-6150 |
URL الوصول: | https://explore.openaire.eu/search/publication?articleId=doi_dedup___::46d4a2d6440fdace039f2a778ecf510d http://europepmc.org/articles/PMC6868770 |
حقوق: | OPEN |
رقم الأكسشن: | edsair.doi.dedup.....46d4a2d6440fdace039f2a778ecf510d |
قاعدة البيانات: | OpenAIRE |
تدمد: | 17456150 |
---|