دورية أكاديمية

Uncovering hidden duplicated content in public transcriptomics data.

التفاصيل البيبلوغرافية
العنوان: Uncovering hidden duplicated content in public transcriptomics data.
المؤلفون: Rosikiewicz M; Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland., Comte A, Niknejad A, Robinson-Rechavi M, Bastian FB
المصدر: Database : the journal of biological databases and curation [Database (Oxford)] 2013 Mar 13; Vol. 2013, pp. bat010. Date of Electronic Publication: 2013 Mar 13 (Print Publication: 2013).
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't
اللغة: English
بيانات الدورية: Publisher: Oxford Journals Country of Publication: England NLM ID: 101517697 Publication Model: Electronic-Print Cited Medium: Internet ISSN: 1758-0463 (Electronic) Linking ISSN: 17580463 NLM ISO Abbreviation: Database (Oxford) Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Oxford : Oxford Journals, 2009-
مواضيع طبية MeSH: Databases, Genetic* , Gene Expression Profiling* , Statistics as Topic*, Gene Expression Regulation ; Humans ; Oligonucleotide Array Sequence Analysis
مستخلص: As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data.
References: Nucleic Acids Res. 2012 Jan;40(Database issue):D54-6. (PMID: 22009675)
Bioinformatics. 2007 May 15;23(10):1282-8. (PMID: 17379688)
Physiol Genomics. 2008 Jun 12;34(1):127-34. (PMID: 18460642)
Biostatistics. 2003 Apr;4(2):249-64. (PMID: 12925520)
Neurology. 2007 Feb 20;68(8):569-77. (PMID: 17151338)
Mol Med. 2011;17(11-12):1146-56. (PMID: 21738952)
Nucleic Acids Res. 2011 Jan;39(Database issue):D1005-10. (PMID: 21097893)
Crit Care Med. 2009 May;37(5):1558-66. (PMID: 19325468)
PLoS One. 2008 Jan 02;3(1):e1385. (PMID: 18167544)
Nat Biotechnol. 2004 Jun;22(6):656-8; author reply 658. (PMID: 15175677)
PLoS One. 2008;3(11):e3621. (PMID: 18978947)
Bioinformatics. 2002 Dec;18(12):1593-9. (PMID: 12490443)
Nucleic Acids Res. 2011 Jan;39(Database issue):D1002-4. (PMID: 21071405)
Mol Med. 2007 Sep-Oct;13(9-10):495-508. (PMID: 17932561)
BMC Med. 2009 Jul 22;7:34. (PMID: 19624809)
Nature. 2011 Oct 19;478(7369):343-8. (PMID: 22012392)
تواريخ الأحداث: Date Created: 20130315 Date Completed: 20130627 Latest Revision: 20211021
رمز التحديث: 20221213
مُعرف محوري في PubMed: PMC3595988
DOI: 10.1093/database/bat010
PMID: 23487185
قاعدة البيانات: MEDLINE
الوصف
تدمد:1758-0463
DOI:10.1093/database/bat010