A Machine Learning Based Framework for Verification and Validation of Massive Scale Image Data

التفاصيل البيبلوغرافية
العنوان: A Machine Learning Based Framework for Verification and Validation of Massive Scale Image Data
المؤلفون: Xin-Hua Hu, Venkat N. Gudivada, Junhua Ding
المصدر: IEEE Transactions on Big Data. 7:451-467
بيانات النشر: Institute of Electrical and Electronics Engineers (IEEE), 2021.
سنة النشر: 2021
مصطلحات موضوعية: Information Systems and Management, business.industry, Computer science, Active learning (machine learning), Big data, Online machine learning, Confusion matrix, 020207 software engineering, 02 engineering and technology, Machine learning, computer.software_genre, Software, Computational learning theory, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Software verification and validation, Data mining, Metamorphic testing, Artificial intelligence, business, computer, Information Systems
الوصف: Big data validation and system verification are crucial for ensuring the quality of big data applications. However, a rigorous technique for such tasks is yet to emerge. During the past decade, we have developed a big data system called CMA for investigating the classification of biological cells based on cell morphology which is captured in diffraction images. CMA includes a collection of scientific software tools, machine learning algorithms, and a large-scale cell image repository. In order to ensure the quality of big data system CMA, we developed a framework for rigorously validating the massive scale image data as well as adequately verifying both the software tools and machine learning algorithms. The validation of big data is conducted by iteratively selecting the data using a machine learning approach. An experimental approach guided by a feature selection algorithm is introduced in the framework to select an optimal feature set for improving the machine learning performance. The verification of software and algorithms is developed on the iterative metamorphic testing approach due to the non-testable property of the software and algorithms. A machine learning approach is introduced for developing test oracles iteratively to ensure the adequacy of the test coverage criteria. Performance of the machine learning algorithm is evaluated with a stratified N-fold cross validation and confusion matrix. We describe the design of the proposed big data verification and validation framework with CMA as the case study, and demonstrate its effectiveness through verifying and validating the dataset, the software and the algorithms in CMA.
تدمد: 2372-2096
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::df427a133b92247a6d8895a4c534375c
https://doi.org/10.1109/tbdata.2017.2680460
حقوق: CLOSED
رقم الأكسشن: edsair.doi...........df427a133b92247a6d8895a4c534375c
قاعدة البيانات: OpenAIRE