Evolved Features for DNA Sequence Classification and Their Fitness Landscapes

التفاصيل البيبلوغرافية
العنوان: Evolved Features for DNA Sequence Classification and Their Fitness Landscapes
المؤلفون: Suprakash Datta, Wendy Ashlock
المصدر: IEEE Transactions on Evolutionary Computation. 17:185-197
بيانات النشر: Institute of Electrical and Electronics Engineers (IEEE), 2013.
سنة النشر: 2013
مصطلحات موضوعية: Sequence, Finite-state machine, Fitness landscape, Computer science, business.industry, Evolutionary algorithm, Overfitting, Machine learning, computer.software_genre, Theoretical Computer Science, Random forest, Computational Theory and Mathematics, Genetic algorithm, Artificial intelligence, business, Cluster analysis, computer, Software
الوصف: A key problem in genomics is the classification and annotation of sequences in a genome. A major challenge is identifying good sequence features. Evolutionary algorithms have the potential to search a large space of features and automatically generate useful ones. This paper proposes a two-stage method that generates features using multiple replicates of a genetic algorithm operating on an augmented finite state machine, called a side effect machine (SEM), and then selects a small diverse feature set using several methods, including a novel method called dissimilarity clustering. We apply our method to three problems related to transposable elements and compare the results to those using k-mer features. We are able to produce a small set of interesting and comprehensible features that create random forest classifiers more accurate and less prone to overfitting than those created using k-mer features. We analyze the SEM fitness landscapes and discuss the use of different fitness functions.
تدمد: 1941-0026
1089-778X
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::8ffcf9cb58ab869adbc0f0a42378b507
https://doi.org/10.1109/tevc.2012.2207120
حقوق: CLOSED
رقم الأكسشن: edsair.doi...........8ffcf9cb58ab869adbc0f0a42378b507
قاعدة البيانات: OpenAIRE