نتائج البحث - "Correction"

1

دورية أكاديمية

A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies

المؤلفون: Richard Van, Daniel Alvarez, Travis Mize, Sravani Gannavarapu, Lohitha Chintham Reddy, Fatma Nasoz, Mira V. Han

المصدر: BMC Bioinformatics, Vol 25, Iss 1, Pp 1-22 (2024)

مصطلحات موضوعية: RNA-Seq, Classification, Cancer, Batch effect correction, Normalization, Data scaling, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5

الوصف: Abstract Background RNA sequencing combined with machine learning techniques has provided a modern approach to the molecular classification of cancer. Class predictors, reflecting the disease class, can be constructed for known tissue types using the gene expression measurements extracted from cancer patients. One challenge of current cancer predictors is that they often have suboptimal performance estimates when integrating molecular datasets generated from different labs. Often, the quality of the data is variable, procured differently, and contains unwanted noise hampering the ability of a predictive model to extract useful information. Data preprocessing methods can be applied in attempts to reduce these systematic variations and harmonize the datasets before they are used to build a machine learning model for resolving tissue of origins. Results We aimed to investigate the impact of data preprocessing steps—focusing on normalization, batch effect correction, and data scaling—through trial and comparison. Our goal was to improve the cross-study predictions of tissue of origin for common cancers on large-scale RNA-Seq datasets derived from thousands of patients and over a dozen tumor types. The results showed that the choice of data preprocessing operations affected the performance of the associated classifier models constructed for tissue of origin predictions in cancer. Conclusion By using TCGA as a training set and applying data preprocessing methods, we demonstrated that batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset. On the other hand, the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO. Therefore, based on our findings with these publicly available large-scale RNA-Seq datasets, the application of data preprocessing techniques to a machine learning pipeline is not always appropriate.

وصف الملف: electronic resource

Relation: https://doaj.org/toc/1471-2105

URL الوصول: https://doaj.org/article/1f53e28f104c412b8a2fb317eca5b915

Find this article in full text from ProQuest

Full Text Finder

2

دورية أكاديمية

scInterpreter: a knowledge-regularized generative model for interpretably integrating scRNA-seq data

المؤلفون: Zhen-Hao Guo, Yan Wu, Siguo Wang, Qinhu Zhang, Jin-Ming Shi, Yan-Bin Wang, Zhan-Heng Chen

المصدر: BMC Bioinformatics, Vol 24, Iss 1, Pp 1-14 (2023)

مصطلحات موضوعية: Single-cell RNA-seq, Batch correction, Integration, Deep learning, Knowledge-regularized, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5

الوصف: Abstract Background The rapid emergence of single-cell RNA-seq (scRNA-seq) data presents remarkable opportunities for broad investigations through integration analyses. However, most integration models are black boxes that lack interpretability or are hard to train. Results To address the above issues, we propose scInterpreter, a deep learning-based interpretable model. scInterpreter substantially outperforms other state-of-the-art (SOTA) models in multiple benchmark datasets. In addition, scInterpreter is extensible and can integrate and annotate atlas scRNA-seq data. We evaluated the robustness of scInterpreter in a variety of situations. Through comparison experiments, we found that with a knowledge prior, the training process can be significantly accelerated. Finally, we conducted interpretability analysis for each dimension (pathway) of cell representation in the embedding space. Conclusions The results showed that the cell representations obtained by scInterpreter are full of biological significance. Through weight sorting, we found several new genes related to pathways in PBMC dataset. In general, scInterpreter is an effective and interpretable integration tool. It is expected that scInterpreter will bring great convenience to the study of single-cell transcriptomics.

وصف الملف: electronic resource

Relation: https://doaj.org/toc/1471-2105

URL الوصول: https://doaj.org/article/4b1520acd296408fa75366a0387eae14

Find this article in full text from ProQuest

Full Text Finder

3

دورية أكاديمية

scInterpreter: a knowledge-regularized generative model for interpretably integrating scRNA-seq data

المؤلفون: Guo, Zhen-Hao^{Aff1, Aff5}, Wu, Yan^{Aff1, IDs12859023055794_cor2}, Wang, Siguo, Zhang, Qinhu^{Aff2, Aff6}, Shi, Jin-Ming, Wang, Yan-Bin, Chen, Zhan-Heng^{Aff5, Aff6, IDs12859023055794_cor7}

المصدر: BMC Bioinformatics. 24(1)

Find this article in full text from ProQuest

Full Text Finder

4

دورية أكاديمية

In-vitro validated methods for encoding digital data in deoxyribonucleic acid (DNA)

المؤلفون: Mortuza, Golam Md, Guerrero, Jorge, Llewellyn, Shoshanna, Tobiason, Michael D., Dickinson, George D., Hughes, William L., Zadegan, Reza^{Aff3, IDs12859023052646_cor7}, Andersen, Tim^{Aff1, IDs12859023052646_cor8}

المصدر: BMC Bioinformatics. 24(1)

Find this article in full text from ProQuest

Full Text Finder

5

دورية أكاديمية

Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage

المؤلفون: Xie, Ranze, Zan, Xiangzhen, Chu, Ling, Su, Yanqing, Xu, Peng^{Aff1, IDs12859023052379_cor5}, Liu, Wenbin^{Aff1, IDs12859023052379_cor6}

المصدر: BMC Bioinformatics. 24(1)

Find this article in full text from ProQuest

Full Text Finder

6

دورية أكاديمية

In-vitro validated methods for encoding digital data in deoxyribonucleic acid (DNA)

المؤلفون: Golam Md Mortuza, Jorge Guerrero, Shoshanna Llewellyn, Michael D. Tobiason, George D. Dickinson, William L. Hughes, Reza Zadegan, Tim Andersen

المصدر: BMC Bioinformatics, Vol 24, Iss 1, Pp 1-21 (2023)

مصطلحات موضوعية: Nucleic acid memory, DNA, Data storage, Information encoding, Error correction, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5

الوصف: Abstract Deoxyribonucleic acid (DNA) is emerging as an alternative archival memory technology. Recent advancements in DNA synthesis and sequencing have both increased the capacity and decreased the cost of storing information in de novo synthesized DNA pools. In this survey, we review methods for translating digital data to and/or from DNA molecules. An emphasis is placed on methods which have been validated by storing and retrieving real-world data via in-vitro experiments.

وصف الملف: electronic resource

Relation: https://doaj.org/toc/1471-2105

URL الوصول: https://doaj.org/article/a892db22f37d470b83c6ecb99d940de2

Find this article in full text from ProQuest

Full Text Finder

7

دورية أكاديمية

Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage

المؤلفون: Ranze Xie, Xiangzhen Zan, Ling Chu, Yanqing Su, Peng Xu, Wenbin Liu

المصدر: BMC Bioinformatics, Vol 24, Iss 1, Pp 1-11 (2023)

مصطلحات موضوعية: DNA storage, Multiple sequence alignment, Error correction, MAFFT, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5

الوصف: Abstract Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage.

وصف الملف: electronic resource

Relation: https://doaj.org/toc/1471-2105

URL الوصول: https://doaj.org/article/f018ae1a6ce2453daa1402adaa125ce3

Find this article in full text from ProQuest

Full Text Finder

8

دورية أكاديمية

A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies

المؤلفون: Van, Richard^{Aff1, Aff3}, Alvarez, Daniel^{Aff2, Aff3}, Mize, Travis, Gannavarapu, Sravani^{Aff2, Aff3}, Chintham Reddy, Lohitha^{Aff2, Aff3}, Nasoz, Fatma^{Aff2, Aff3}, Han, Mira V.^{Aff1, Aff3, IDs1285902405801x_cor7}

المصدر: BMC Bioinformatics. 25(1)

Find this article in full text from ProQuest

Full Text Finder

9

دورية أكاديمية

SparkEC: speeding up alignment-based DNA error correction tools

المؤلفون: Roberto R. Expósito, Marco Martínez-Sánchez, Juan Touriño

المصدر: BMC Bioinformatics, Vol 23, Iss 1, Pp 1-17 (2022)

مصطلحات موضوعية: Error correction, Big data, Distributed processing, Apache Spark, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5

الوصف: Abstract Background In recent years, huge improvements have been made in the context of sequencing genomic data under what is called Next Generation Sequencing (NGS). However, the DNA reads generated by current NGS platforms are not free of errors, which can affect the quality of downstream analysis. Although error correction can be performed as a preprocessing step to overcome this issue, it usually requires long computational times to analyze those large datasets generated nowadays through NGS. Therefore, new software capable of scaling out on a cluster of nodes with high performance is of great importance. Results In this paper, we present SparkEC, a parallel tool capable of fixing those errors produced during the sequencing process. For this purpose, the algorithms proposed by the CloudEC tool, which is already proved to perform accurate corrections, have been analyzed and optimized to improve their performance by relying on the Apache Spark framework together with the introduction of other enhancements such as the usage of memory-efficient data structures and the avoidance of any input preprocessing. The experimental results have shown significant improvements in the computational times of SparkEC when compared to CloudEC for all the representative datasets and scenarios under evaluation, providing an average and maximum speedups of 4.9 $$\times$$ × and 11.9 $$\times$$ × , respectively, over its counterpart. Conclusion As error correction can take excessive computational time, SparkEC provides a scalable solution for correcting large datasets. Due to its distributed implementation, SparkEC speed can increase with respect to the number of nodes in a cluster. Furthermore, the software is freely available under GPLv3 license and is compatible with different operating systems (Linux, Windows and macOS).

وصف الملف: electronic resource

Relation: https://doaj.org/toc/1471-2105

URL الوصول: https://doaj.org/article/88fa921fb3fe496fb529049d9eaff68d

Find this article in full text from ProQuest

Full Text Finder

10

دورية أكاديمية

CARE 2.0: reducing false-positive sequencing error corrections using machine learning

المؤلفون: Felix Kallenborn, Julian Cascitti, Bertil Schmidt

المصدر: BMC Bioinformatics, Vol 23, Iss 1, Pp 1-17 (2022)

مصطلحات موضوعية: Next-generation sequencing, Error correction, Machine learning, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5

الوصف: Abstract Background Next-generation sequencing pipelines often perform error correction as a preprocessing step to obtain cleaned input data. State-of-the-art error correction programs are able to reliably detect and correct the majority of sequencing errors. However, they also introduce new errors by making false-positive corrections. These correction mistakes can have negative impact on downstream analysis, such as k-mer statistics, de-novo assembly, and variant calling. This motivates the need for more precise error correction tools. Results We present CARE 2.0, a context-aware read error correction tool based on multiple sequence alignment targeting Illumina datasets. In addition to a number of newly introduced optimizations its most significant change is the replacement of CARE 1.0’s hand-crafted correction conditions with a novel classifier based on random decision forests trained on Illumina data. This results in up to two orders-of-magnitude fewer false-positive corrections compared to other state-of-the-art error correction software. At the same time, CARE 2.0 is able to achieve high numbers of true-positive corrections comparable to its competitors. On a simulated full human dataset with 914M reads CARE 2.0 generates only 1.2M false positives (FPs) (and 801.4M true positives (TPs)) at a highly competitive runtime while the best corrections achieved by other state-of-the-art tools contain at least 3.9M FPs and at most 814.5M TPs. Better de-novo assembly and improved k-mer analysis show the applicability of CARE 2.0 to real-world data. Conclusion False-positive corrections can negatively influence down-stream analysis. The precision of CARE 2.0 greatly reduces the number of those corrections compared to other state-of-the-art programs including BFC, Karect, Musket, Bcool, SGA, and Lighter. Thus, higher-quality datasets are produced which improve k-mer analysis and de-novo assembly in real-world datasets which demonstrates the applicability of machine learning techniques in the context of sequencing read error correction. CARE 2.0 is written in C++/CUDA for Linux systems and can be run on the CPU as well as on CUDA-enabled GPUs. It is available at https://github.com/fkallen/CARE .

وصف الملف: electronic resource

Relation: https://doaj.org/toc/1471-2105

URL الوصول: https://doaj.org/article/4ce75ad398e246598147f1cec991fd1e

Find this article in full text from ProQuest

Full Text Finder

تنقيح النتائج